Enable job alerts via email!

AWS Head of Site Reliability Engineering (Must hold current SC)

ZipRecruiter

London

On-site

GBP 80,000 - 130,000

Full time

Today

Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Start fresh or import an existing resume

Job summary

A leading consultancy seeks a Head of Site Reliability Engineering focusing on AWS infrastructure. The role involves team leadership, overseeing cloud management, and implementing SRE best practices. This position is ideal for a visionary leader with extensive AWS expertise and a commitment to operational excellence.

Benefits

Flexible work environment

Private medical insurance

Company pension plan

25 days annual leave plus UK bank holidays

Access to Perkbox rewards platform

Generous employee referral program

Qualifications

8+ years in Site Reliability Engineering or similar roles.
2+ years in a leadership position.
Experience with AWS services (EC2, S3, Lambda, etc.) and SRE best practices.

Responsibilities

Lead SRE team ensuring high availability and performance.
Implement SRE principles and manage cloud infrastructure in AWS.
Drive performance optimization and automation initiatives.

Skills

AWS Expertise

Incident Management

Team Leadership

Automation Tools

Education

AWS Certified Solutions Architect – Professional

AWS Certified DevOps Engineer

Tools

Terraform

CloudFormation

Jenkins

Job Description

AWS Head of Site Reliability Engineering (Must hold current SC)

The Company:

At Amber Labs, we are a cutting-edge UK and European technology consultancy that prioritises empowering autonomy, promoting experimentation, and facilitating rapid learning to provide exceptional value to our clients. Our company culture is centred around collaboration, where all colleagues, regardless of their role, work together to minimise risk and shorten delivery times. Our team consists of highly-skilled cross-functional consultants, analysts, and support staff.

Overview:

We are looking for a highly skilled and visionary leader to join our team as the Head of Site Reliability Engineering (SRE) with a strong focus on AWS cloud infrastructure. The ideal candidate will have a deep understanding of cloud architectures, extensive experience in SRE practices, and the ability to lead and scale SRE teams to ensure the availability, performance, and security of our systems.

Key Responsibilities:

Leadership and Team Management: Lead and manage the SRE team to ensure high availability, scalability, and performance of our AWS-based infrastructure. Provide mentorship and guidance to junior and senior engineers, fostering a culture of operational excellence and continuous improvement.
Cloud Infrastructure Management: Oversee the design, implementation, and maintenance of cloud infrastructure in AWS, ensuring the systems are secure, reliable, and highly available. Use best practices for AWS services, automation, and monitoring.
SRE Practices Implementation: Establish and lead the implementation of SRE principles, such as Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Error Budgets, to drive the team's focus on reliability.
Incident Management: Lead incident response efforts, root cause analysis (RCA), and post-incident reviews to improve system reliability. Ensure rapid response to production issues and minimize downtime.
Performance Optimization: Drive initiatives for performance tuning, cost optimization, and efficient use of AWS resources. Ensure the infrastructure can scale to meet the demands of the business.
Automation and Continuous Improvement: Champion the automation of manual tasks, such as deployments, monitoring, and scaling, using tools like Terraform, CloudFormation, Jenkins, and other CI/CD platforms.
Collaboration: Work closely with cross-functional teams (Engineering, DevOps, Security, etc.) to ensure seamless collaboration in achieving business and technical goals.
Monitoring and Alerts: Implement and maintain robust monitoring, alerting, and logging systems to detect issues before they impact the business, using AWS CloudWatch, Prometheus, Grafana, etc.
Cost Management: Help optimize AWS costs while maintaining operational efficiency and reliability.

Required Qualifications:

Experience: 8+ years of experience in Site Reliability Engineering, DevOps, or similar roles, with at least 2 years in a leadership position.
AWS Expertise: Extensive experience with AWS services, such as EC2, S3, Lambda, RDS, VPC, CloudFormation, CloudWatch, etc. Hands-on experience with cloud architecture and design.
SRE Best Practices: Deep understanding of SRE principles and frameworks, including SLOs, SLIs, and Error Budgets.
Incident Management: Proven experience in incident management, including response, recovery, root cause analysis, and post-mortem reporting.
Automation Tools: Proficient in automation tools like Terraform, CloudFormation, Jenkins, and other CI/CD tools.

Qualifications:

Certifications: AWS Certified Solutions Architect – Professional, AWS Certified DevOps Engineer, or other relevant certifications.
Agile Methodologies: Experience with Agile and Lean practices in a cloud environment.

Benefits:

Competitive salary and performance-based bonus structure.
Join a rapidly expanding start-up where personal growth is a part of our DNA.
Benefit from a flexible work environment focused on deliverable outcomes.
Receive private medical insurance through Aviva.
Enjoy the benefits of a company pension plan through Nest.
25 days of annual leave plus UK bank holidays.
Access Perkbox, a global employee rewards platform offering discounts, perks, and wellness resources.
Participate in a generous employee referral program.
A highly collaborative and collegial environment with opportunities for career advancement.
Be encouraged to take bold steps and embrace a mindset of experimentation.
Choose your device, PC or Mac.

& :

Here at Amber Labs, we are dedicated to fostering an inclusive and equitable workplace for all. Our commitment to diversity, equality, and inclusion includes:

Valuing the unique experiences, perspectives, and backgrounds of all employees and creating an environment where everyone feels welcomed, respected, and valued.

Prohibiting all forms of harassment, bullying, discrimination, and victimisation and promoting a culture of dignity and respect for all.

Educating all new hires on our diversity and inclusion policies and ensuring they are aware of their rights and responsibilities to create a safe and inclusive workplace.

By taking these steps, we are dedicated to building a workplace that reflects and celebrates the diversity of our employees and communities.

This role at Amber Labs is a 12 Month FTC position, and all employees are required to meet the Baseline Personnel Security Standard (BPSS) and hold current SC. Please be advised that, at this time, we are unable to consider candidates who require sponsorship or hold a visa of any type.

What Happens Next?

Our Talent Acquisition Team will be in touch to advise you on the next steps. We have a two-stage interview process for most of our roles. In certain cases, we may include a third and final stage, which is a conversation with the company Partners. This will only be considered if deemed necessary.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

AWS Head of Site Reliability Engineering (Must hold current SC)

ZipRecruiter

London

On-site

GBP 80,000 - 130,000