Enable job alerts via email!

Lead Site Reliability Engineer

Holland & Barrett

Tetford

On-site

GBP 65,000 - 85,000

Full time

3 days ago

Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading health and wellness company in the UK is seeking a Senior Site Reliability Engineer. In this role, you will ensure the high-quality delivery of software by building and maintaining developer tools, championing automation, and leading incident responses. The ideal candidate has extensive experience in cloud infrastructure and operational engineering, along with strong coding skills in Python, Go, or similar. This position offers opportunities for technical leadership and collaboration across teams.

Qualifications

5-8+ years in SRE, Platform, Cloud Infrastructure, or operational engineering.
Hands-on experience architecting large-scale, distributed systems.
Strong coding proficiency in Python, Go, Bash, or similar.

Responsibilities

Architect and improve cloud-native systems focusing on reliability.
Lead high-severity incident responses with clarity and technical skill.
Build tools and automation to enhance engineering efficiency.

Skills

Cloud Infrastructure

Automation

Team Collaboration

Incident Response

Monitoring

Troubleshooting

Tools

AWS

Terraform

Datadog

Prometheus

Grafana

Senior Site Reliability Engineer

As a Senior Site Reliability Engineer, you will ensure the high-quality delivery of our software by building and maintaining tools used by software engineers and data scientists to deploy, and monitor their code. In this role, you will be a champion of automation, reliability, and operational excellence.

Technical Leadership That Raises the Bar

Architect and improve cloud-native systems with reliability as a first-class principle.
Shape SLIs/SLOs, error budgets, capacity planning, and performance strategies.
Continuously evolve availability, efficiency, and resilience across our platforms.
Mentor SREs, platform engineers, and developers across the organisation.
Champion automation, observability, DevSecOps, and modern operational practices.
Influence engineering culture and architectural direction.

Operational Excellence

Own and lead high-severity incident response with calm, clarity, and technical depth.
Run world-class post-incident reviews and drive meaningful, measurable improvements.
Strengthen monitoring, alerting, on-call practices, and reliability processes.
Support resilience validation through load testing, stress testing, and chaos engineering.

Automation, Tooling & Engineering Efficiency

Build tools and automation that remove toil and accelerate teams.
Develop CI/CD pipelines and Infrastructure-as-Code environments.
Drive consistency, repeatability, and self-service across engineering.

Cross-Team Collaboration

Partner with Security, Platform, and Engineering teams to align reliability with security and resilience goals.
Lead teams toward better design, operational readiness, and measurable service health.
Contribute to documentation, runbooks, and operational processes that scale.

The security engineering team is missioned to build security services, platforms and technologies, as well as to support cross-functional teams to protect our users, products and infrastructures.

Qualifications

5-8+ years in SRE, Platform, Cloud Infrastructure, or operational engineering roles.
Hands‑on experience architecting and improving large‑scale, distributed systems.
Strong coding proficiency in Python, Go, Bash, or similar automation‑focused languages.
Expertise with observability stacks: Datadog, Prometheus, Grafana, OpenTelemetry.
Deep AWS experience across EC2, EKS, Lambda, VPC, DynamoDB, S3, CloudFront, RDS, IAM, KMS, and more.
Proficiency with Terraform, CloudFormation, or AWS CDK.
Incident response leadership and root‑cause analysis expertise.
Excellent documentation and communication skills.
Strong analytical and troubleshooting abilities.

Bonus

Experience mentoring or leading engineers within SRE or platform teams.
Experience with load testing, stress testing, and chaos engineering.
A passion for uplifting engineering culture through tooling, automation, and reliability‑first thinking.

Are you an AWS specialist who thrives on solving complex architecture challenges? We're hiring a Lead Cloud Engineer (AWS) to design and deliver enterprise‑scale landing zones that set the standard for my clients cloud excellence. This isn't just another cloud role....

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.