Enable job alerts via email!

Site Reliability Engineer (SRE)

Charles Simon Associates Ltd

City Of London

Remote

GBP 75,000 - 95,000

Full time

Today
Be an early applicant

Job summary

A global organisation is seeking a Remote Site Reliability Engineer to ensure high reliability and performance of their systems. The ideal candidate will possess strong experience in Terraform, Kubernetes, and automation scripting with a passion for Site Reliability Engineering. This role emphasizes monitoring and observability with tools like Datadog. Salary up to £95,000 plus benefits.

Benefits

Up to £95,000 per annum
Benefits package

Qualifications

  • Proven Site Reliability Engineering background required.
  • Strong Terraform skills with live environment deployment experience.
  • Kubernetes / AKS expertise essential.

Responsibilities

  • Design and enforce SLOs, SLIs, and SLAs.
  • Build and maintain monitoring solutions using Datadog.
  • Manage Infrastructure as Code for deployments.

Skills

Site Reliability Engineering background
Terraform skills
Kubernetes / AKS expertise
Scripting in PowerShell
Monitoring experience with Datadog
Job description
Overview

Site Reliability Engineer (SRE, Terraform, AKS, Azure, Kubernetes, PowerShell, Python, Bash, Datadog, Monitoring Tools) Permanent Remote

Location: Remote (occasional travel to Nottinghamshire HQ)
Salary: Up to £95,000 per annum + benefits
Start Date: ASAP

Charles Simon Associates are working with a global organisation who are looking to recruit a Site Reliability Engineer (SRE) on a permanent basis. This is an exciting opportunity to join a forward-thinking business where reliability, scalability, and automation are at the heart of technology delivery.

Responsibilities
  • Designing and enforcing SLOs, SLIs, and SLAs to ensure high reliability and performance.
  • Building and maintaining monitoring/observability solutions (Datadog, Grafana, Azure Application Insights, Log Analytics).
  • Managing Infrastructure as Code (Terraform, Pulumi, CloudFormation) for scalable, repeatable deployments.
  • Automating with PowerShell, Python, or Bash to drive efficiency.
  • Supporting Kubernetes and AKS environments in production.
  • Leading incident response, postmortems, and continuous improvement processes.
  • Driving cost optimisation, capacity planning, and load testing.
  • Championing best practices in cloud security and resilience.
Key Skills & Experience Required
  • Proven Site Reliability Engineering background.
  • Strong Terraform skills with live environment deployment.
  • Kubernetes / AKS expertise.
  • Scripting in PowerShell, Python or Bash.
  • Monitoring experience (Datadog preferred, Azure or Grafana considered).
  • Background in web applications and distributed systems.
Desirable Skills
  • Knowledge of Microservices Architecture.
  • Familiarity with Kanban.
  • Experience with Puppet or Chef

If you re passionate about Site Reliability Engineering and want to work in an environment where that will do is never good enough, this role is for you.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.