Enable job alerts via email!

Site Reliability Engineer

ZipRecruiter

Stoke-on-Trent

Hybrid

GBP 45,000 - 75,000

Full time

13 days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An innovative company is seeking a Site Reliability Engineer to enhance system reliability and performance through strong engineering practices. This role involves monitoring critical systems, implementing solutions for reliability, and collaborating across functions to integrate best practices into the software development life cycle. You will play a vital role in ensuring systems meet user demands while fostering a culture of continuous improvement. If you are passionate about technology and eager to make a significant impact, this opportunity is perfect for you.

Qualifications

  • Strong software engineering skills focused on system reliability.
  • Experience with Infrastructure as Code (IaC) automation tools.

Responsibilities

  • Enhancing reliability and observability of services through coding.
  • Maintaining tools for operational efficiency and resilience.

Skills

Site Reliability Engineering principles
Service Level Indicators (SLI)
Service Level Objectives (SLO)
Python
Golang
JavaScript
Splunk
New Relic
Grafana
Ansible
Terraform
Shell scripting

Tools

Open Telemetry
Telemetry tools

Job description

Job Description

Who we are looking for

A Site Reliability Engineer, who will enhance system reliability, observability and performance through a strong engineering approach and assist with incident resolution and best practices.

You will have software engineering skills, focusing on system reliability and observability. You will monitor the health, performance and availability of critical systems, directly impacting operational efficiency.

Using your engineering expertise, you will implement solutions that enhance reliability, including service instrumentation with tools such as Open Telemetry, improve logging practices and develop features for maintainability. You will also help engineer tools and automation for effective service management.

Collaboration is key, working across multiple functions to integrate reliability and observability best practices into the software development life cycle. By supporting governance standards set by the central teams, you will foster a culture where these principles are integral to development. Your contributions will ensure our systems meet user demands and enhance overall service performance.

This role is eligible for in the Company’s hybrid working from home policy.

skills and experience

  • Excellent knowledge of Site Reliability Engineering principles, including the creation and management of effective Service Level Indicators (SLI) and Service Level Objectives (SLO) for reliability and customer satisfaction.
  • Knowledge of contemporary observability tools, techniques and best practice including Splunk, New Relic, Grafana and Pager Duty.
  • Excellent knowledge of programming including Python, Golang and JavaScript.
  • Knowledge and experience of modern software development techniques and lifecycles.
  • Experience with Infrastructure as Code (IaC) automation and orchestration tools such as Ansible and Terraform.
  • Prior experience working in a large scale, 24/7 enterprise where system uptime and stability is of paramount importance to the Business.
  • Keen interest of industry trends, particularly Platform Engineering.
  • Proficiency in shell scripting for automation and system management tasks.

Main Responsibilities

  • Writing and contributing to code that enhances the reliability and observability of services, including telemetry, operational APIs and tooling.
  • Developing and maintaining tools that facilitate effective management of our systems, ensuring they are operationally efficient and resilient.
  • Working with automation and orchestration platforms to automate manual activity and reduce toil.
  • Building sophisticated dashboards using a range of telemetry data and dash boarding technologies like Grafana, Splunk and New Relic.
  • Maintaining and administering existing monitoring and analytic toolsets.
  • Mentoring colleagues in use of new technologies or practices.
  • Actively participating in live incident resolution and post-mortem analysis, providing effective remediation strategies to improve overall system health and prevent future issues.
  • Driving initiatives to enhance system reliability and observability, contributing to a culture of continuous improvement.
  • Collaborating with the central Site Reliability Engineering and Observability teams to establish and uphold standards for reliability and observability, assisting teams in adhering to these practices.
  • Working with IT Operations, providing and supporting the use of critical tooling to enable increasing levels of value to the Business.

By applying to us you are agreeing to share your Personal Data in accordance with our Recruitment Privacy Policy - http://www.bet365careers.com/privacypolicy.pdf.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Site Reliability Engineer – FinTech / Global Payments – London HQ / Remote First

JR United Kingdom

Greater Manchester

Remote

GBP 60,000 - 100,000

12 days ago

Site Reliability Engineer – FinTech / Global Payments – London HQ / Remote First

JR United Kingdom

Bolton

Remote

GBP 50,000 - 90,000

15 days ago

Site Reliability Engineer - Core & Security (f/m/d)

cloudControl

Remote

GBP 50,000 - 80,000

Today
Be an early applicant

Senior Site Reliability Engineer

General Motors

Remote

GBP 60,000 - 90,000

Yesterday
Be an early applicant

Site Reliability Engineer (SRE)

MCS Group | Your Specialist Recruitment Consultancy

Belfast

Remote

GBP 50,000 - 70,000

2 days ago
Be an early applicant

Site Reliability Engineer

Stratospherec Limited

Greater London

Remote

GBP 70,000 - 85,000

4 days ago
Be an early applicant

Site Reliability Engineer – FinTech / Global Payments – London HQ / Remote First

JR United Kingdom

Leigh

Remote

GBP 50,000 - 90,000

21 days ago

Site Reliability Engineer (Remote in the United Kingdom)

KnowBe4, Inc.

Sheffield

Remote

GBP 40,000 - 80,000

27 days ago

Site Reliability Engineer – FinTech / Global Payments – London HQ / Remote First

JR United Kingdom

Ashton-under-Lyne

Remote

GBP 50,000 - 90,000

21 days ago