Enable job alerts via email!

Site Reliability Engineering Manager

慨正橡扯

London

On-site

GBP 70,000 - 110,000

Full time

Yesterday
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An established industry player is seeking a Site Reliability Engineering Manager to oversee the technology platforms that power their website. This role involves ensuring high availability and performance standards while leading a team to migrate services to Google Cloud. The ideal candidate will have extensive experience in managing technical operations and a strong understanding of DevOps principles. You will drive continuous improvement initiatives and collaborate closely with engineering teams to enhance service reliability. If you thrive in high-pressure environments and are passionate about technology, this opportunity is perfect for you.

Qualifications

  • Experience managing engineers in website infrastructure and web services.
  • Deep understanding of DevOps and SRE principles.
  • Strong operational awareness and leadership skills.

Responsibilities

  • Ensure the team maintains a healthy, resilient, and secure infrastructure.
  • Create and manage technical plans for cloud migration.
  • Optimize service health in collaboration with engineering teams.

Skills

Site Reliability Engineering
Cloud Migration
DevOps Principles
Technical Operations Management
Incident Management
Team Leadership

Tools

Google Cloud Platform
Google Kubernetes Engine
Gitlab
Jira
Confluence
Elastic APM
F5
VMware
Java
Python

Job description

The Platform and Reliability Engineering Team are responsible for the technology platforms and services that underpin the Rightmove website, ensuring it is available, secure and performing to a world-class standard. We strive to deliver annual availability of at least 99.99% (less than 5 mins downtime a month).

The Site Reliability Engineering Manager’s focus is to ensure their teams maintain our datacentre and cloud website infrastructure, safely migrate services to Google Cloud, and enable others to easily manage the reliability of production services across the Rightmove Website Estate.

A typical week as the Site Reliability Engineering Manager might involve:

·Ensuring the right people, process and tooling are in place to maintain a healthy, resilient, and secure datacentre and cloud website platform.

·Creating and managing technical plans for the migration of applications and infrastructure to Google Cloud.

·Developing cloud engineering and operations skills within your teams

·Working through supplier due diligence process for support contract renewals to ensure key components are kept in support.

·Working with engineering managers, product owners, and engineers to optimise and improve service health

·Identify, plan and implement improvements to the incident management process

·Reducing handoffs or improving flow/lead times within development teams by providing operational/infrastructure support for the platform.

We’re looking for someone who:

·Has previous experience managing engineers that are building and running website infrastructure and web services and previous experience running website technical operations.

·Is highly operationally aware, understanding what it takes to maintain a healthy website infrastructure and services.

·Is an experienced manager who understands how to get the best out of their people and teams.

·Has excellent judgement and can instill this in engineers, leading them to the best outcomes on technical decisions and architecture whilst enabling their development.

·Is happy to dive deep into technical discussions with their team and can surface risks and issues relating to projects.

·Is able to keep calm and work effectively in high pressure situations

·Has experience migrating infrastructure and web services from datacentres to cloud

·Has deep experience and understanding of DevOps and SRE principles and practices

·Always pushes for continuous improvement and has strong attention to detail

Relevant Technology we use:

·F5, Juniper, Arbor

·VMware, HP 3Par

·Google Cloud Platform

·Google Kubernetes Engine with Anthos Service Mesh

·Confluent Cloud

·Incident.io

·Gitlab

·Jira, Confluence, Slack, Teams

·Elastic APM, Kibana

·Eggplant Monitoring, Xymon

· Java, Node, Python, Javascript, Go

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Site Reliability Engineering Lead

JR United Kingdom

London

Hybrid

GBP 60,000 - 100,000

4 days ago
Be an early applicant

Site Reliability Engineering Manager

ZipRecruiter

London

On-site

GBP 70,000 - 110,000

2 days ago
Be an early applicant

Site Reliability Engineering Manager

JR United Kingdom

London

On-site

GBP 70,000 - 110,000

10 days ago

Site Reliability Engineering Manager

TN United Kingdom

London

On-site

USD 60,000 - 100,000

8 days ago

Site Reliability Engineering Lead

MarkJames Search

Greater London

Hybrid

GBP 60,000 - 100,000

8 days ago

Senior Site Reliability Engineering Manager, Production Engineering

ThousandEyes (part of Cisco)

London

Hybrid

GBP 60,000 - 100,000

Yesterday
Be an early applicant

Senior Site Reliability Engineering Manager, Production Engineering New London, Greater London,[...]

ThousandEyes

London

Hybrid

GBP 80,000 - 120,000

2 days ago
Be an early applicant

Senior Site Reliability Engineering Manager, Production Engineering

ThousandEyes

London

Hybrid

GBP 80,000 - 120,000

Yesterday
Be an early applicant

Site Reliability Engineering Manager

Cboe

London

Hybrid

GBP 60,000 - 100,000

24 days ago