Enable job alerts via email!

AVP, Site Reliablility Engineering Lead (SRE), Middle Office Technology, Technology & Operations

Vodafone

Singapore

On-site

SGD 80,000 - 120,000

Full time

24 days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

Vodafone seeks a skilled DevSecOPS Site Reliability Engineer (SRE) to enhance infrastructure through software solutions and automation. The successful candidate will optimally run production applications, focusing on incident management and building self-healing systems in a collaborative global team environment.

Qualifications

Strong understanding of DevOps tools and practices required.
Experience in deploying applications in containerized environments is essential.
Skills in Python, Java, or Go for site reliability engineering are preferred.

Responsibilities

Develop, test, and debug automated tasks across various systems.
Troubleshoot incidents and facilitate post-mortem analyses.
Work with development teams to ensure sustainable software releases.

Skills

DevOps practices

Automation

Troubleshooting

Networking protocols

Tools

Docker

Kubernetes

AWS

Terraform

Git

Jenkins

Prometheus

Splunk

Elasticsearch

Grafana

Business Function

Group Technology and Operations (T&O) enables and empowers the bank with an efficient, nimble, and resilient infrastructure through a strategic focus on productivity, quality & control, technology, people capability, and innovation. In Group T&O, we manage the majority of the Bank's operational processes and aim to delight our business partners through our multiple banking delivery channels.

The Regulatory Reporting team leads the overall front-to-back strategy and development of the regulatory reporting landscape and drives business changes for the Group Finance Platform (GFP).

Responsibilities:

As a DevSecOPS Site Reliability Engineer (SRE), you'll help build a meaningful engineering discipline, combining software and systems to develop creative engineering solutions to operations problems. Your work will focus on optimizing existing systems, building infrastructure, and reducing work through automation. You'll join a team of curious problem solvers with diverse perspectives, thinking big and taking risks. In this environment, you'll lead relevant projects with support from an organization that fosters learning and growth. Your focus as an SRE will be on running better production applications and systems.

Develop, test, and debug automated tasks (Apps, Systems, Infrastructure)
Troubleshoot priority incidents, facilitate blameless post-mortems
Work with development teams throughout the software lifecycle ensuring sustainable software releases
Perform analytics on previous incidents and usage patterns to better predict issues and take proactive actions
Build and promote adoption of greater self-healing and resiliency patterns
Lead and participate in performance tests; identify bottlenecks, opportunities for optimization, and capacity demands
Adhere to architecture standards, risk management, and security policies
Collaborate with global teams, product owners, and business teams to develop, build, and support applications
Communicate and coordinate on development items with global teams and resolve issues impacting development
Provide post-production application support
Participate in quality assurance, peer reviews, and code reviews

Requirements:

Strong technical skills with innovative solutions that benefit our global customers
Thought leadership in end-to-end solutioning
Up-to-date with industry technologies
Active engagement in developer and tech meetups
Strong understanding of DevOps practices, tools, and techniques
Experience deploying applications in containerized environments (Docker, Kubernetes, PCF, OpenShift, AWS)
In-depth OS experience (RHEL, Ubuntu, Windows Server) with troubleshooting skills
Experience in site reliability engineering using languages like Python, Java, PowerShell, Shell scripting, or Go
Hands-on experience with cloud technologies and tools such as Prometheus, Splunk, Elasticsearch, Grafana
Proficiency with modern development tools and methodologies (Agile, CI/CD, Git, Terraform, Jenkins)
Good understanding of networking protocols and cybersecurity in cloud environments
Advanced SQL knowledge, experience with RDBMS, Hadoop, and NoSQL databases
Experience in root cause analysis and data processing to improve business processes
Knowledge of message queuing, stream processing, and scalable big data stores
Experience collaborating with cross-functional teams in dynamic environments