Enable job alerts via email!

Site Reliability Engineer

Bet365

England

On-site

GBP 60,000 - 90,000

Full time

Yesterday

Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading online gaming company in the UK is seeking a Site Reliability Engineer to enhance system reliability and performance. You will monitor critical systems, implement solutions for maintainability, and collaborate across teams to integrate best practices. Ideal candidates will have knowledge in observability tools like Splunk and Grafana, as well as automation experience with tools such as Ansible and Terraform. This role emphasizes code contributions to drive reliability and observability.

Benefits

Eye care

Flu Vaccinations

Life Assurance

Qualifications

Excellent knowledge of Site Reliability Engineering principles, including SLIs and SLOs.
Experience in a large scale, 24/7 enterprise environment.
Proficiency in writing shell scripts for automation.

Responsibilities

Enhance reliability and observability of services through coding.
Monitor health and performance of critical systems.
Collaborate across functions to embed best practices.

Skills

Site Reliability Engineering principles

Contemporary observability tools

Infrastructure as Code (IaC)

Shell scripting

Software development techniques

Automation tools

Tools

Splunk

New Relic

Grafana

Ansible

Terraform

As a Site Reliability Engineer, you will enhance system reliability, observability and performance through a strong engineering approach and assist with incident resolution and best practices.

Full-time. Closes 28/01/2026.

You will have software engineering skills, focusing on system reliability and observability. You will monitor the health, performance and availability of critical systems, directly impacting operational efficiency.

Using your engineering expertise, you will implement solutions that enhance reliability, including service instrumentation with tools such as Open Telemetry, improve logging practices and develop features for maintainability. You will also help engineer tools and automation for effective service management.

Collaboration is key, working across multiple functions to integrate reliability and observability best practices into the software development life cycle. By supporting governance standards set by the central teams, you will foster a culture where these principles are integral to development. Your contributions will ensure our systems meet user demands and enhance overall service performance.

Developing and maintaining tools that facilitate effective management of our systems, ensuring they are operationally efficient and resilient.

Working with automation and orchestration platforms to automate manual activity and reduce toil.

Building sophisticated dashboards using a range of telemetry data and dashboarding technologies like Grafana, Splunk and New Relic.

Maintaining and administering existing monitoring and analytic toolsets.

Mentoring colleagues in use of new technologies or practices.

Actively participating in live incident resolution and post-mortem analysis, providing effective remediation strategies to improve overall system health and prevent future issues.

Driving initiatives to enhance system reliability and observability, contributing to a culture of continuous improvement.

Collaborating with the central Site Reliability Engineering and Observability teams to establish and uphold standards for reliability and observability, assisting teams in adhering to these practices.

Working with IT Operations, providing and supporting the use of critical tooling to enable increasing levels of value to the Business.

Bonus

Eye care and Flu Vaccinations
Life Assurance

Life at bet365

We are a unique global operator with passion and drive to be the best in the industry. Our values form the foundation of culture and shape the unique way that we work. People are our superpower and we support you to be the best you can be.

Qualifications

Excellent knowledge of Site Reliability Engineering principles, including the creation and management of effective Service Level Indicators (SLI) and Service Level Objectives (SLO) for reliability and customer satisfaction.
Knowledge of contemporary observability tools, techniques and best practice including Splunk, New Relic, Grafana and Pager Duty.
Knowledge and experience of modern software development techniques and lifecycles.
Experience with Infrastructure as Code (IaC) automation and orchestration tools such as Ansible and Terraform.
Prior experience working in a large scale, 24/7 enterprise where system uptime and stability is of paramount importance to the Business.
Keen interest of industry trends, particularly Platform Engineering.
Proficiency in shell scripting for automation and system management tasks.

What you will be doing

Writing and contributing to code that enhances the reliability and observability of services, including telemetry, operational APIs and tooling.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Top cities

Top companies

Popular jobs