Enable job alerts via email!

Site Reliability Engineer

NatWest Group

London

On-site

GBP 150,000 - 200,000

Full time

Today

Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

An established industry player is seeking a Site Reliability Engineer to enhance the reliability and performance of its systems. This role involves defining service level objectives, improving automation processes, and providing technical expertise to ensure a seamless operational environment. You'll collaborate with diverse teams, driving innovation while managing risks effectively. This inclusive team values professional development and offers an exciting opportunity to make a significant impact in a dynamic environment. If you're passionate about reliability engineering and eager to contribute to a forward-thinking organization, this role is for you.

Strong knowledge of reliability systems and site reliability engineering.
Experience with programming, automation, and observability tools.

Improve system reliability and define SLOs, SLIs, and error budgets.
Coach colleagues and manage production and non-production environments.

Reliability systems thinking

Site reliability engineering

Programming languages

Deploy and release services

Automation

Troubleshooting

Communication skills

Observability tools (Grafana, Prometheus, OpenTelemetry)

Public cloud environments (AWS, GCP)

Infrastructure as Code (Terraform)

Join us as a Site Reliability Engineer

In this key role, you’ll improve, drive, and embed non-functional and operational characteristics such as availability, performance, efficiency, change management, monitoring, security, incident response, and capacity planning of our products and services
You’ll enjoy significant stakeholder interaction, working in collaboration with engineers and product owners to ensure a principled approach to deliver change in a safe and secure way
This is a chance to join an inclusive team with a collaborative ethos and a commitment to innovation and professional development

What you'll do

As our Site Reliability Engineer, you’ll work closely with our feature team and other colleagues to meet defined service level objectives and continually improve system and environment reliability. You’ll define SLOs, SLIs and error budgets that support finding the right balance between risk reliability and continuous improvement.

You’ll also provide structure and help to our release process, suggesting and making improvements where possible. You’ll scale systems sustainably through mechanisms like automation, evolving them by pushing for changes that improve reliability and velocity. We’ll also look to you to coach and provide guidance to colleagues and the wider team, leading where required.

In addition to this, you’ll:

Proactively contribute new ideas and innovations to meet short term and longer-term goals
Continually balance and manage any potential risks
Be accountable for the day-to-day development and health of both production and non-production environments and respond to any incidents as required
Provide technical expertise and input to establish the risk tolerance of products and services
Communicate incident status updates clearly and frequently to other teams, customers and stakeholders and support blameless post-mortems

The skills you'll need

We’re looking for someone with strong knowledge of reliability systems thinking and experience of site reliability engineering. You’ll need experience of using a data driven and scientific approach to fact finding. We’ll also look for financial services knowledge, and the ability to identify wider business impact, risk and opportunity, and make connections across key outputs and processes

We’re also looking for:

Good knowledge and experience of programming languages
Strong knowledge of deploy and release services, automation, and troubleshooting
Experience of utilising tools and technology across the software development lifecycle
Experience using mathematical and statistical models to assess trends
Strong communication skills with the ability to proactively engage with a wide range of stakeholders
In depth experience with observability tools such as Grafana, Prometheus and OpenTelemetry
Strong knowledge of publlic cloud environments such as AWS and GCP, and Infrastructure as Code tools such as Terraform

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Site Reliability Engineer

NatWest Group

London

On-site

GBP 150,000 - 200,000

Full time

Job summary

Qualifications

Responsibilities

Skills

Job description