Enable job alerts via email!

Manager, SRE

GroupM

London

On-site

GBP 60,000 - 100,000

Full time

30+ days ago

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Start fresh or import an existing resume

Job summary

An established industry player is seeking a dynamic Manager of Site Reliability Engineering to lead a talented team. This pivotal role involves ensuring the reliability and performance of systems while collaborating closely with engineering and operations teams. You will drive best practices, mentor team members, and lead initiatives to enhance system reliability. If you are passionate about shaping the future of technology and making a significant impact in the AdTech industry, this opportunity is perfect for you. Join a diverse and inclusive team that values innovation and collaboration.

Qualifications

  • Proven experience in SRE or DevOps with leadership experience.
  • Strong knowledge of cloud platforms and infrastructure technologies.

Responsibilities

  • Recruit and mentor top SRE talent while fostering collaboration.
  • Define reliability standards and improve incident management processes.

Skills

Site Reliability Engineering (SRE)
DevOps
Cloud Platforms (AWS, GCP, Azure)
Kubernetes
Docker
Terraform
Monitoring Tools (Prometheus, Grafana, Datadog, Splunk)
Programming/Scripting (Python, Go, Bash)
Networking
Distributed Systems

Education

Bachelor's Degree in Computer Science or related field

Tools

Prometheus
Grafana
Datadog
Splunk

Job description

WHO WE ARE

Choreograph is WPP’s global data products and technology company. We’re on a mission to transform marketing by building the fastest, most connected data platform that bridges marketing strategy to scaled activation.

We work with agencies and clients to transform the value of data by bringing together technology, data and analytics capabilities. We deliver this through the Open Media Studio, an AI-enabled media and data platform for the next era of advertising.

We’re endlessly curious. Our team of thinkers, builders, creators and problem solvers are over 1,000 strong, across 20 markets around the world.

WHO WE ARE LOOKING FOR

We are seeking an experienced and motivated Manager, Site Reliability Engineering (SRE) to lead and grow our team of SREs. This role is critical in ensuring the reliability, scalability, and performance of our systems and applications. As a Manager of SRE, you will collaborate closely with engineering, product, and operations teams to design, build, and maintain highly available and resilient infrastructure. You will drive best practices, mentor team members, and lead efforts to continuously improve system reliability and operational efficiency.

(Please note this is a UK based role and requires individuals to have the right to work in this location)

WHAT YOU WILL DO

  • Recruit, mentor, and retain top SRE talent.
  • Provide guidance and technical leadership to the SRE team.
  • Foster a culture of ownership, collaboration, and continuous learning.
  • Manage team performance, set clear goals, and conduct regular performance reviews.
  • Define and implement reliability standards.
  • Develop and improve incident management processes in alignment with engineering support, ensuring effective resolution and root cause analysis.
  • Drive proactive monitoring, alerting, and automation to minimize downtime and improve system reliability.
  • Lead efforts to eliminate single points of failure.
  • Collaborate with DevOps practice to ensure best practices to accelerate delivery.
  • Drive post-incident reviews and implement preventative measures.

WHAT YOU WILL NEED

  • Proven experience in SRE, DevOps, or related roles, with some experience in a leadership or managerial position.
  • Strong knowledge of cloud platforms (AWS, GCP, Azure) and modern infrastructure technologies (Kubernetes, Docker, Terraform).
  • Expertise in monitoring, logging, and observability tools (e.g., Prometheus, Grafana, Datadog, Splunk).
  • Proficiency in at least one programming or scripting language (e.g., Python, Go, Bash).
  • Deep understanding of networking, databases, and distributed systems.
  • Strong communication, collaboration, and problem-solving skills.
  • Experience with incident response, on-call rotations, and post-mortem processes.

If you are ready to be at the forefront of the AdTech industry, shaping its future, and driving success for both Choreograph and our clients, we encourage you to apply and join our team.

Choreograph is the beating heart of data inside WPP’s media investment group, GroupM, the world’s leading media investment company responsible for more than $60 billion in annual media investment. Discover more about Choreograph at www.choreograph.com

GroupM and all its affiliates embrace and celebrate diversity, inclusivity, and equal opportunity. We are committed to building a team that represents a variety of backgrounds, perspectives, and skills. We are a worldwide media agency network that represents global clients. The more inclusive we are, the more great work we can create together.

#LI-Promoted

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.