Enable job alerts via email!

Site Reliability Engineer

NatWest Group

City of Edinburgh

Remote

GBP 50,000 - 90,000

Full time

Today
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

Join a forward-thinking company as a Site Reliability Engineer, where you will enhance the reliability and performance of critical systems. This role involves collaborating with engineers and product owners to establish service level objectives, manage risks, and drive innovations. You'll leverage your expertise in programming, automation, and observability tools to ensure robust and efficient service delivery. Embrace the opportunity to work in a supportive, inclusive environment that values professional growth and innovation. If you are passionate about technology and want to make a significant impact, this position is perfect for you.

Qualifications

  • Strong knowledge of reliability systems and site reliability engineering.
  • Experience with observability tools and public cloud environments.

Responsibilities

  • Improve operational characteristics like availability and performance.
  • Scale systems sustainably through automation and process improvements.

Skills

Site Reliability Engineering
Programming Languages
Deploy and Release Services
Automation
Troubleshooting
Observability Tools (Grafana, Prometheus, OpenTelemetry)
Public Cloud Environments (AWS, GCP)
Infrastructure as Code (Terraform)
Communication Skills

Education

Degree in Computer Science or related field

Tools

Grafana
Prometheus
OpenTelemetry
Terraform

Job description

Join us as a Site Reliability Engineer

  • In this key role, you’ll improve, drive, and embed non-functional and operational characteristics such as availability, performance, efficiency, change management, monitoring, security, incident response, and capacity planning of our products and services.
  • You’ll enjoy significant stakeholder interaction, working in collaboration with engineers and product owners to ensure a principled approach to deliver change in a safe and secure way.
  • This is a chance to join an inclusive team with a collaborative ethos and a commitment to innovation and professional development.
What you'll do

As our Site Reliability Engineer, you’ll work closely with our feature team and other colleagues to meet defined service level objectives and continually improve system and environment reliability. You’ll define SLOs, SLIs, and error budgets that support finding the right balance between risk reliability and continuous improvement.

You’ll also provide structure and help to our release process, suggesting and making improvements where possible. You’ll scale systems sustainably through mechanisms like automation, evolving them by pushing for changes that improve reliability and velocity. We’ll also look to you to coach and provide guidance to colleagues and the wider team, leading where required.

In addition to this, you’ll:

  • Proactively contribute new ideas and innovations to meet short-term and longer-term goals.
  • Continually balance and manage any potential risks.
  • Be accountable for the day-to-day development and health of both production and non-production environments and respond to any incidents as required.
  • Provide technical expertise and input to establish the risk tolerance of products and services.
  • Communicate incident status updates clearly and frequently to other teams, customers, and stakeholders and support blameless post-mortems.
The skills you'll need

We’re looking for someone with strong knowledge of reliability systems thinking and experience of site reliability engineering. You’ll need experience of using a data-driven and scientific approach to fact finding. We’ll also look for financial services knowledge, and the ability to identify wider business impact, risk, and opportunity, and make connections across key outputs and processes.

We’re also looking for:

  • Good knowledge and experience of programming languages.
  • Strong knowledge of deploy and release services, automation, and troubleshooting.
  • Experience of utilising tools and technology across the software development lifecycle.
  • Experience using mathematical and statistical models to assess trends.
  • Strong communication skills with the ability to proactively engage with a wide range of stakeholders.
  • In-depth experience with observability tools such as Grafana, Prometheus, and OpenTelemetry.
  • Strong knowledge of public cloud environments such as AWS and GCP, and Infrastructure as Code tools such as Terraform.
Hours

35

Job Posting Closing Date: 24/04/2025

Ways of Working: Remote First

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.