Enable job alerts via email!

Site Reliability Engineering Manager

慨正橡扯

London

On-site

GBP 70,000 - 110,000

Full time

Yesterday

Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An established industry player is seeking a Site Reliability Engineering Manager to oversee the technology platforms that power their website. This role involves ensuring high availability and performance standards while leading a team to migrate services to Google Cloud. The ideal candidate will have extensive experience in managing technical operations and a strong understanding of DevOps principles. You will drive continuous improvement initiatives and collaborate closely with engineering teams to enhance service reliability. If you thrive in high-pressure environments and are passionate about technology, this opportunity is perfect for you.

Qualifications

Experience managing engineers in website infrastructure and web services.
Deep understanding of DevOps and SRE principles.
Strong operational awareness and leadership skills.

Responsibilities

Ensure the team maintains a healthy, resilient, and secure infrastructure.
Create and manage technical plans for cloud migration.
Optimize service health in collaboration with engineering teams.

Skills

Site Reliability Engineering

Cloud Migration

DevOps Principles

Technical Operations Management

Incident Management

Team Leadership

Tools

Google Cloud Platform

Google Kubernetes Engine

Gitlab

Jira

Confluence

Elastic APM

VMware

Java

Python

The Platform and Reliability Engineering Team are responsible for the technology platforms and services that underpin the Rightmove website, ensuring it is available, secure and performing to a world-class standard. We strive to deliver annual availability of at least 99.99% (less than 5 mins downtime a month).

The Site Reliability Engineering Manager’s focus is to ensure their teams maintain our datacentre and cloud website infrastructure, safely migrate services to Google Cloud, and enable others to easily manage the reliability of production services across the Rightmove Website Estate.

A typical week as the Site Reliability Engineering Manager might involve:

·Ensuring the right people, process and tooling are in place to maintain a healthy, resilient, and secure datacentre and cloud website platform.

·Creating and managing technical plans for the migration of applications and infrastructure to Google Cloud.

·Developing cloud engineering and operations skills within your teams

·Working through supplier due diligence process for support contract renewals to ensure key components are kept in support.

·Working with engineering managers, product owners, and engineers to optimise and improve service health

·Identify, plan and implement improvements to the incident management process

·Reducing handoffs or improving flow/lead times within development teams by providing operational/infrastructure support for the platform.

We’re looking for someone who:

·Has previous experience managing engineers that are building and running website infrastructure and web services and previous experience running website technical operations.

·Is highly operationally aware, understanding what it takes to maintain a healthy website infrastructure and services.

·Is an experienced manager who understands how to get the best out of their people and teams.

·Has excellent judgement and can instill this in engineers, leading them to the best outcomes on technical decisions and architecture whilst enabling their development.

·Is happy to dive deep into technical discussions with their team and can surface risks and issues relating to projects.

·Is able to keep calm and work effectively in high pressure situations

·Has experience migrating infrastructure and web services from datacentres to cloud

·Has deep experience and understanding of DevOps and SRE principles and practices

·Always pushes for continuous improvement and has strong attention to detail

Relevant Technology we use:

·F5, Juniper, Arbor

·VMware, HP 3Par

·Google Cloud Platform

·Google Kubernetes Engine with Anthos Service Mesh

·Confluent Cloud

·Incident.io

·Gitlab

·Jira, Confluence, Slack, Teams

·Elastic APM, Kibana

·Eggplant Monitoring, Xymon

· Java, Node, Python, Javascript, Go

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Site Reliability Engineering Lead

JR United Kingdom

London

Hybrid

GBP 60,000 - 100,000

4 days ago

Be an early applicant