Enable job alerts via email!

Lead Site Reliability Engineer | Copperleaf

IFS

England

Hybrid

GBP 70,000 - 90,000

Full time

Today

Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading software company is seeking a mid-senior level Lead Site Reliability Engineer specializing in Azure. You will design, implement and enhance Azure-based infrastructure ensuring high-availability and reliability. Responsibilities include automating processes, mentoring junior engineers, and driving initiatives for infrastructure optimization. The ideal candidate has at least 5 years of experience in the field, with a strong focus on Azure services and cloud operations. Embrace flexibility and hybrid work in a supportive environment.

Qualifications

5 years experience in SRE, Cloud Operations, or DevOps roles with at least 3 years focused on Microsoft Azure.
Deep expertise in Azure services including App Services, AKS, and Azure SQL.
Strong automation and scripting skills with PowerShell, Python, or Bash.

Responsibilities

Lead the design, implementation, and continuous improvement of Azure-based infrastructure.
Automate deployment pipelines using Azure DevOps, ARM / Bicep, Terraform.
Drive root cause analysis and resolution of complex production incidents.

Skills

Kubernetes

Continuous Improvement

Troubleshooting

Tools

Terraform

Azure DevOps

ARM / Bicep

CopperleafIFS software helps some of the world's largest energy firms make better strategic decisions.

Our Cloud Operations Team a crucial component of our Software as a Service (SaaS) offering also delivers Infrastructure as a Service (IaaS) to IFS Copperleaf. Built on the foundation of Site Reliability Engineering we are expanding. Our commitment is to the reliability and uptime of our services and we consistently aim to automate processes and minimize manual labor. We are currently seeking a mid senior level cloud engineer to contribute to these services and assist in enhancing the operational aspects of each service.

As a Lead Site Reliability Engineer (SRE) specializing in Azure you will play a pivotal role in architecting operating and optimizing our cloud infrastructure. You will lead initiatives to ensure the reliability scalability and security of our Azure-based SaaS offerings. You’ll mentor junior engineers drive automation and partner with development teams to deliver robust high-availability solutions.

Key Responsibilities

Lead the design implementation and continuous improvement of Azure-based infrastructure for high-availability mission-critical SaaS services.
Architect and automate deployment pipelines using Azure DevOps, ARM / Bicep, Terraform and related tools.
Own and enhance monitoring alerting and incident response for Azure resources (App Services, AKS, SQL, Storage, Networking, etc.).
Drive root cause analysis and resolution of complex production incidents collaborating across teams.
Define and enforce SLOs, SLIs and SLAs for Azure-hosted SaaS services.
Champion security best practices including identity access, secrets and certificate management in Azure.
Mentor and coach junior SREs and CloudOps engineers.
Partner with development teams to embed reliability and operational excellence into the SDLC.
Evaluate and implement new Azure features and services to improve reliability, performance and cost efficiency.
Document architecture runbooks and operational procedures for Azure environments.

Qualifications

5 years experience in SRE Cloud Operations or DevOps roles with at least 3 years focused on Microsoft Azure.
Deep expertise in Azure services (App Services, AKS, Azure SQL, Storage, Networking, Security Center, Monitor, etc.).
Strong automation and scripting skills (PowerShell, Python, Bash or similar).
Proven experience with Infrastructure as Code (Terraform, ARM / Bicep).
Advanced troubleshooting of distributed systems, networking and application performance in Azure.
Solid understanding of microservices, container orchestration (Kubernetes / AKS) and CI / CD pipelines.
Experience with monitoring, logging and observability tools (Azure Monitor, Log Analytics, Application Insights).
Strong grasp of security protocols, certificate and secret management and compliance in Azure.
Demonstrated ability to lead incident response and post-mortem analysis.
Excellent communication skills and a passion for mentoring others.

Preferred Qualifications

Azure certifications (e.g. Azure Solutions Architect, Azure DevOps Engineer).
Experience with hybrid or multi-cloud environments including AWS.
Familiarity with cost management and optimization in Azure.
Experience supporting large-scale SaaS platforms.

Additional Information

We embrace flexibility and hybrid work opportunities to support diverse needs and lifestyles while also valuing inclusive workplace experiences. By fostering a sense of community we drive innovation, strengthen connections and nurture belonging. Our commitment ensures you can work in a way that suits you best while also engaging with colleagues to share ideas and build meaningful relationships.

Key Skills

Kubernetes
FMEA
Continuous Improvement
Elasticsearch
Go
Root cause Analysis
Maximo
CMMS
Maintenance
Mechanical Engineering
Manufacturing
Troubleshooting

Vacancy: 1

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Top cities

Top companies

Popular jobs