Enable job alerts via email!

Site Reliability Engineering Manager

ZipRecruiter

London

On-site

GBP 70,000 - 110,000

Full time

30+ days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An innovative firm is seeking a Site Reliability Engineering Manager to lead its European operations. This pivotal role involves ensuring the reliability and performance of critical infrastructure while collaborating with cross-functional teams. The ideal candidate will possess strong technical leadership and a passion for operational excellence, driving initiatives that enhance system availability and scalability. You will mentor a regional team, oversee incident responses, and lead automation efforts to streamline workflows. Join this high-impact team and make a significant contribution to the company's success in delivering secure and reliable platforms.

Qualifications

  • 7+ years in a technical SRE or DevOps position.
  • 2+ years in a leadership or senior engineering role.

Responsibilities

  • Lead and mentor a high-performing SRE team in Europe.
  • Champion initiatives that enhance system availability and performance.

Skills

Python programming
SQL and data analytics tools
FIX protocol and market data analysis
AWS
Kubernetes
Monitoring tools (Datadog, Prometheus, Grafana)
Automation frameworks (Terraform, Ansible, Pulumi)

Education

Bachelor’s degree in Computer Science, Engineering, or related field
Master’s degree (preferred)

Tools

Jira

Job description

Job Description

The SRE Manager is responsible for leading the Site Reliability Engineering function across Europe, ensuring the reliability, scalability, and performance of critical infrastructure and services. This role plays a key part in the global follow-the-sun support model, working closely with the Global SRE Leader to support platforms worldwide.

The ideal candidate will bring strong technical leadership, deep subject matter expertise, and a passion for operational excellence to a high-impact team. You'll collaborate with Engineering, Infrastructure, and Operations teams to maintain high availability and resilient service delivery, while also mentoring a regional SRE team focused on continuous improvement and innovation.

Key Responsibilities:

Technical Leadership

  • Develop deep expertise in the Titanium trading platform to lead and support critical business operations.
  • Oversee team workload, ensuring priorities align with business goals and resource capacity.

Operational Excellence

  • Champion initiatives that enhance system availability, scalability, and performance.
  • Collaborate with the Global SRE Leader to refine and enforce operational policies (e.g., Capacity Planning, Change Management, Disaster Recovery).

Cross-Functional Collaboration

  • Partner with Software Engineering, Infrastructure, Operations, Security, and Business teams to deliver secure and reliable platforms.

Team Development

  • Build, lead, and mentor a high-performing SRE team in Europe, fostering a culture of ownership, collaboration, and innovation.

Incident Response & Postmortems

  • Lead response efforts for critical incidents, ensuring swift resolution and comprehensive root cause analysis.
  • Drive long-term improvements based on lessons learned from Learning Reviews, and maintain accurate incident documentation and compliance reporting.

Automation & Efficiency

  • Lead automation initiatives to streamline workflows and increase uptime.
  • Use Jira to manage tasks and projects, and align global SRE practices for seamless support.

Capacity Planning

  • Drive timely capacity planning to prevent last-minute issues.
  • Support budget planning to align infrastructure investments with growth and performance targets.
  • Participate in quarterly capacity reviews and follow up on outcomes.

Monitoring & Analytics

  • Oversee the implementation of monitoring and alerting systems to detect and resolve issues proactively—before customer or compliance impacts occur.

Qualifications:

  • Bachelor’s degree in Computer Science, Engineering, or related field (Master’s )
  • 7+ years in a technical SRE, DevOps Position
  • 2+ years in a leadership or senior engineering capacity

Skills:

  • Strong Python programming skills
  • Proficiency in SQL and data analytics tools (e.g., Sigma, Snowflake)
  • Experience with FIX protocol and market data analysis
  • proficient in AWS, Kubernetes, monitoring tools (Datadog, Prometheus, Grafana), and automation frameworks (Terraform, Ansible, Pulumi)

For more information, please apply with a relevant CV.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Site Reliability Engineering Manager

Canonical

London null

Remote

Remote

GBP 90,000 - 120,000

Full time

3 days ago
Be an early applicant

Lead, Site Reliability Engineering, Infrastructure Security London

MongoDB

London null

Remote

Remote

GBP 60,000 - 100,000

Full time

30+ days ago

Lead, Site Reliability Engineering, Infrastructure Security

MongoDB

London null

Remote

Remote

GBP 60,000 - 100,000

Full time

30+ days ago

Global Site Reliability Engineering Lead Manager at Citi

Out in Science, Technology, Engineering, and Mathematics

London null

Hybrid

Hybrid

GBP 70,000 - 90,000

Full time

6 days ago
Be an early applicant

Head of Site Reliability Engineering & Platform

DeepL

London null

Hybrid

Hybrid

GBP 90,000 - 130,000

Full time

12 days ago

Head of Site Reliability Engineering - Midnight

Io Me

London null

On-site

On-site

GBP 90,000 - 130,000

Full time

30+ days ago

Head of Site Reliability Engineering (SRE)

ZipRecruiter

London null

On-site

On-site

GBP 90,000 - 150,000

Full time

19 days ago

Head of Site Reliability Engineering

Rewardgateway

London null

On-site

On-site

GBP 100,000 - 120,000

Full time

30+ days ago

Manager, Cloud Site Reliability Engineering

Barracuda

Reading null

On-site

On-site

GBP 70,000 - 100,000

Full time

12 days ago