Senior Site Reliability Engineer
As a Senior Site Reliability Engineer, you will ensure the high-quality delivery of our software by building and maintaining tools used by software engineers and data scientists to deploy, and monitor their code. In this role, you will be a champion of automation, reliability, and operational excellence.
Technical Leadership That Raises the Bar
- Architect and improve cloud-native systems with reliability as a first-class principle.
- Shape SLIs/SLOs, error budgets, capacity planning, and performance strategies.
- Continuously evolve availability, efficiency, and resilience across our platforms.
- Mentor SREs, platform engineers, and developers across the organisation.
- Champion automation, observability, DevSecOps, and modern operational practices.
- Influence engineering culture and architectural direction.
Operational Excellence
- Own and lead high-severity incident response with calm, clarity, and technical depth.
- Run world-class post-incident reviews and drive meaningful, measurable improvements.
- Strengthen monitoring, alerting, on-call practices, and reliability processes.
- Support resilience validation through load testing, stress testing, and chaos engineering.
Automation, Tooling & Engineering Efficiency
- Build tools and automation that remove toil and accelerate teams.
- Develop CI/CD pipelines and Infrastructure-as-Code environments.
- Drive consistency, repeatability, and self-service across engineering.
Cross-Team Collaboration
- Partner with Security, Platform, and Engineering teams to align reliability with security and resilience goals.
- Lead teams toward better design, operational readiness, and measurable service health.
- Contribute to documentation, runbooks, and operational processes that scale.
The security engineering team is missioned to build security services, platforms and technologies, as well as to support cross-functional teams to protect our users, products and infrastructures.
Qualifications
- 5-8+ years in SRE, Platform, Cloud Infrastructure, or operational engineering roles.
- Hands‑on experience architecting and improving large‑scale, distributed systems.
- Strong coding proficiency in Python, Go, Bash, or similar automation‑focused languages.
- Expertise with observability stacks: Datadog, Prometheus, Grafana, OpenTelemetry.
- Deep AWS experience across EC2, EKS, Lambda, VPC, DynamoDB, S3, CloudFront, RDS, IAM, KMS, and more.
- Proficiency with Terraform, CloudFormation, or AWS CDK.
- Incident response leadership and root‑cause analysis expertise.
- Excellent documentation and communication skills.
- Strong analytical and troubleshooting abilities.
Bonus
- Experience mentoring or leading engineers within SRE or platform teams.
- Experience with load testing, stress testing, and chaos engineering.
- A passion for uplifting engineering culture through tooling, automation, and reliability‑first thinking.
Are you an AWS specialist who thrives on solving complex architecture challenges? We're hiring a Lead Cloud Engineer (AWS) to design and deliver enterprise‑scale landing zones that set the standard for my clients cloud excellence. This isn't just another cloud role....