Enable job alerts via email!

11381 - Senior Site Reliability Manager

Ministry of Justice

United Kingdom

On-site

GBP 80,000 - 100,000

Full time

Yesterday
Be an early applicant

Job summary

A governmental organization in the United Kingdom seeks a Senior Site Reliability Engineer to join a collaborative team and enhance digital service reliability. The ideal candidate will have experience with SRE principles, cloud services, and a strong commitment to improving operational maturity and quality. This role emphasizes teamwork and communication, with opportunities for personal and professional growth.

Qualifications

  • Experience of working with technologies including databases, web servers, and load balancers.
  • Understanding version control tools, ideally Git.
  • Familiarity with container orchestration technologies.
  • Experience with public cloud providers in a production environment.
  • Understanding of SRE principles and how to design resilient services.
  • Ability to deploy and manage monitoring tools.
  • Familiarity with at least one programming language.
  • Experience with Infrastructure as Code tools.

Responsibilities

  • Support best practices in reliability engineering.
  • Design, build, and test systems for software development.
  • Work collaboratively across disciplines.
  • Shape effective processes and identify meaningful metrics.
  • Communicate openly about concerns with the team.
  • Respond to incidents with care and clarity.
  • Mentor and coach junior colleagues.

Skills

Experience with digital services technologies
Familiarity with Git
Container orchestration (Kubernetes, ECS)
Public cloud providers (Azure, AWS, Google Cloud)
Understanding of SRE principles (capacity planning, SLOs, SLIs)
Monitoring tools
Programming experience (Node.js, Java, Kotlin)
Infrastructure as Code (Terraform)
Strong communication skills
Job description
The Role

We’re recruiting for a Senior Site Reliability Engineer here at Justice Digital, to be part of our warm and collaborative HMPPS Digital team.

This role aligns against Senior DevOps engineer from the Government Digital and Data Framework.

This is a great opportunity for thoughtful and collaborative Site Reliability Engineer (SRE) to join our HMPPS Digital Live Support Team. This is a fantastic opportunity for someone with experience in SRE or a DevOps Engineer ready to grow into the SRE space. You’ll play a key role in nurturing the operational maturity, quality, and performance of our digital services, working closely with our product teams to ensure we continue delivering reliable, user‑focused solutions.

This role is central to our mission of supporting colleagues across HMPPS with high‑quality digital services. You’ll be part of a supportive and inclusive team that values continuous learning, shared success, and making a meaningful difference.

Our service exists to create tools that help HMPPS provide decent, safe, and productive environments for both residents and staff. We support prisons and Probation in their vital work to protect the public and reduce re‑offending by enabling rehabilitation through education and employment.

Our Live Services team ensures that the technology underpinning our digital services is dependable and accessible across the HMPPS estate. We take pride in being responsive, compassionate, and committed to improving the experience of those who rely on our services every day.

To help picture your life at MoJ Justice Digital please take a look at our blog and our Digital and Technology strategy 2025.

Key Responsibilities
  • Support and champion best practices in reliability engineering, helping teams build confidence in their systems and processes.
  • Design, build, and test systems that enable smooth and secure software development and deployment.
  • Work collaboratively across disciplines, partnering with developers, technical architects, product managers, and others to co‑create resilient and scalable platforms.
  • Help shape effective processes, identifying and measuring meaningful metrics that guide continuous improvement.
  • Work with colleagues to identify and address technical risks, contributing to thoughtful and proactive mitigation strategies.
  • Communicate openly and constructively about concerns, risks, and issues with the wider team and senior stakeholders.
  • Respond to incidents with care and clarity, prioritising improvements and sharing learnings to strengthen our services.
  • Foster a culture of psychological safety, encouraging open, respectful, and supportive communication within and beyond the team.
  • Build and nurture relationships with teams across HMPPS Digital and Justice Digital, creating a strong network of collaboration.
  • Collaborate closely with fellow SREs and developers, sharing knowledge and supporting each other’s growth.
  • Work in partnership with Cyber Security and Information Assurance teams to uphold the integrity and security of our services.
  • Provide mentoring and coaching to junior colleagues, helping them grow with confidence and purpose.
  • Support inclusive hiring practices, participating in recruitment and helping to create a welcoming experience for candidates.

If this feels like an exciting challenge, something you are enthusiastic about, and want to join our team please read on and apply!

Person Specification

Essential

  • Have experience of working with technologies that underpin digital services such as databases, web servers, DNS, CDNs, reverse proxies, message queues and load balancers.
  • Understand version control tools, ideally Git, and enjoy working in a structured and transparent way.
  • Are familiar with container orchestration technologies such as Kubernetes, ECS or Cloud Foundry; or serverless application design such as AWS Lambda.
  • Have worked with public cloud providers such as Azure, AWS, or Google Cloud in a live production environment.
  • Have an understanding of SRE principles such as capacity planning, SLOs and SLIs and how to design and support resilient, large‑scale, high‑performance services in a production environment.
  • Can deploy and manage monitoring tools, helping teams stay informed and empowered to respond to operational issues with confidence.
  • Are familiar with at least one programming language (we mainly use Node.js, Java and Kotlin).
  • Prefer automation and have experience with Infrastructure as Code tools, such as Terraform, to help teams work more efficiently and reliably.
  • Enjoy learning and sharing knowledge and take pride in helping others grow.

Willingness to be assessed against the requirements for SC clearance.

We welcome the unique contribution diverse applicants bring and do not discriminate based on culture, ethnicity, race, nationality or national origin, age, sex, gender identity or expression, religion or belief, disability status, sexual orientation, educational or social background or any other factor.

Our values are Purpose, Humanity, Openness and Together. Find out more here about how we celebrate diversity and an inclusive culture in our workplace.

The Civil Service is committed to attract, retain and invest in talent wherever it is found. To learn more please see the Civil Service People Plan and the Civil Service D&I Strategy.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.