Enable job alerts via email!

Senior Site Reliability Engineer

Leap29

Wokingham

Hybrid

GBP 100,000 - 125,000

Full time

Today

Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A technology-focused company is looking for a Senior Site Reliability Engineer (SRE) in Wokingham. The ideal candidate will lead efforts in maintaining reliability, performance, and scalability of critical platforms. Responsibilities include designing observability systems and automating processes, with a strong emphasis on OpenShift. This role offers a competitive rate and the opportunity to make a significant impact within a fast-paced environment.

Qualifications

5+ years of experience in SRE, DevOps, or production engineering roles.
Expertise operating in high-availability, fast-paced production environments.
Solid engineering foundation with experience reading and writing production code.

Responsibilities

Ensure high availability, performance, and latency of critical systems across cloud environments.
Design and implement observability systems to proactively detect issues.
Automate manual processes through infrastructure-as-code and modern CI/CD pipelines.

Skills

Cloud & Container Management

CI/CD Implementation

Incident Management

Automation

Scripting (Python, Bash)

Tools

OpenShift

Terraform

GitHub Actions

Azure DevOps

Jenkins

Senior Site Reliability Engineer (SRE)

Location : Wokingham (2 days / week onsite)

Type : Inside IR35

Rate : Up to £70.00 per hour (DOE)

We’re looking for a Senior Site Reliability Engineer (SRE) to lead efforts in maintaining the reliability, performance, and scalability of mission-critical platforms and services. This role is ideal for someone who thrives at the intersection of software engineering, infrastructure, automation, and incident response.

You’ll be instrumental in defining and implementing the standards and systems that keep applications running smoothly across cloud and hybrid environments—including OpenShift clusters.

What You’ll Be Responsible For

As a Senior SRE, you will :

Ensure high availability, performance, and latency of critical systems across Azure, AWS, and OpenShift.
Design and implement robust observability systems (logging, monitoring, alerting) to detect and resolve issues proactively.
Lead and evolve incident management processes—runbooks, comms, postmortems, and root cause analysis.
Define and monitor SLIs, SLOs, and error budgets to balance innovation with stability.
Automate manual processes through infrastructure-as-code, scripting, and modern CI / CD pipelines.
Mentor engineering teams on best practices for deployment, reliability, scalability, and incident preparedness.
Support and scale OpenShift-based containerized applications, including upgrade strategies, patching, and workload optimization.

Core Responsibilities

Operations & Incident Management

Act as the senior escalation point for outages and critical incidents.
Lead post-incident reviews and implement long-term remediation plans.
Communicate platform health and risk posture to stakeholders at all levels.

Engineering & Automation

Build and improve CI / CD pipelines using tools like Azure DevOps, GitHub Actions, Jenkins, and GitLab.
Design scalable, fault-tolerant infrastructure with IaC tools (Terraform, Bicep).
Create internal tools and automation to accelerate development and reduce operational toil.

Strategic & Advisory

Architect cloud and container infrastructure, with a focus on OpenShift, Kubernetes, and hybrid deployments.
Collaborate with engineering, architecture, and security teams to embed reliability into the SDLC.
Promote advanced deployment strategies (blue-green, canary, rolling updates) and rollback readiness.
Drive a culture of reliability, observability, and operational excellence across engineering teams.

Technical Environment

Cloud & Containers : Azure, AWS, OpenShift, Kubernetes, Docker, App Services, IaaS (EC2, VMs)
CI / CD & Automation : Terraform, Bicep, Azure DevOps, Jenkins, GitHub Actions, GitLab
Observability : Prometheus, Grafana, Datadog, ELK, Splunk, Application Insights, CloudWatch
Languages & Scripting : Python, C#, Bash, PowerShell
Networking : DNS, SSL / TLS, load balancing, WAF, proxies, CDN, Azure App Gateway
Databases : MSSQL, PostgreSQL, MongoDB, CosmosDB, DynamoDB
OS & Systems : Windows, Linux, Nginx, IIS

Ideal Candidate Profile

5+ years of experience in SRE, DevOps, or production engineering roles.
Expertise operating in high-availability, fast-paced production environments.
Solid engineering foundation with experience reading and writing production code.
Hands‑on experience deploying, supporting, and scaling OpenShift environments.
Proven track record of leading incident responses and improving system reliability.
Strong collaboration and mentoring abilities across infrastructure, development, and security teams.

What You’ll Bring

Ability to balance operational risk with engineering velocity.
Strong communication skills across technical and non-technical audiences.
A passion for automating everything and eliminating manual work.
A mindset of ownership, continuous improvement, and technical leadership.

Ready to make reliability your legacy?

If you’re a senior SRE with OpenShift experience and a drive to solve complex operational challenges, we’d love to hear from you.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Top cities

Top companies

Popular jobs