Enable job alerts via email!

Senior Site Reliability Engineer

ConSol Partners

Greater London

Remote

GBP 70,000 - 100,000

Full time

Today

Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading company is seeking a Lead Site Reliability Engineer to enhance system reliability and performance. This fully remote role involves managing cloud resources, improving backend code, and leading incident management. Applicants should have extensive experience in cloud technologies, Kubernetes, and Terraform, with a strong focus on observability and incident management. Join a dynamic team dedicated to delivering high-quality software solutions.

Qualifications

6+ years of experience in full stack development, SRE, or platform engineering.
Proficiency in backend stacks like Python/Django, Node/NestJS, Go, Java/Spring.

Responsibilities

Own end-to-end system reliability and improve backend code performance.
Architect and scale Kubernetes deployments for high availability.

Skills

GCP

Kubernetes

Terraform

Python

Java

Incident Management

Observability

Tools

Datadog

PagerDuty

Helm

GitOps

CloudSQL

Get AI-powered advice on this job and more exclusive features.

Direct message the job poster from ConSol Partners

Technology Headhunter at ConSol Partners

Hi ,

I'm excited to share that one of our clients in UK is hiring for a Lead Site Reliability Engineer. It's a fully remote job. Below are the job details. If you're interested, please send your CV to apply.

Title: Lead Site Reliability Engineer

Location: London, UK

Duration: Permanent, fulltime or Contract

Job Type: Fully Remote

Must have GCP experience.

Key Responsibilities:

Own end-to-end system reliability, from cloud resource planning to code-level instrumentation.
Review and improve backend code for performance, resiliency, and observability (e.g., retries, timeouts, connection pools, logging).
Architect and scale multi-environment Kubernetes deployments (GKE preferred) for high availability and low drift.
Collaborate with FullStack teams on release readiness, CI/CD quality gates, and infra-aware feature rollout.
Harden secret management, IAM policies, and privilege boundaries across apps and services.
Serve as a hands-on lead in incidents, root cause analysis, and long-term reliability improvements.
Write and review Terraform modules, Helm charts, or platform tooling (bash/python/go) as needed.
Lead design reviews and cross-functional decisions that impact both product and platform reliability.

Requirements:

6+ years of experience across full stack development, SRE, or platform engineering.
Proficiency in one or more backend stacks (e.g., Python/Django, Node/NestJS, Go, Java/Spring) and ability to review or contribute code.
Strong expertise in Kubernetes (GKE preferred) and Helm—can optimize, secure, and debug real-world workloads.
Strong command of Terraform and IaC workflows, ideally with Terraform Cloud and remote state strategy.
Solid understanding of GCP or similar cloud provider (IAM, VPCs, CloudSQL, networking, Secret Manager, monitoring).
Experience implementing or enforcing progressive delivery practices (ArgoCD, Flux, GitOps, CI/CD patterns).
Proven ability to improve system observability using tools like Datadog, Prometheus, OpenTelemetry.
Ability to “go deep” into an application repo, identify architectural flaws or infra misuse, and fix or guide others.
Calm under pressure and experienced in incident management and postmortem culture.

Tools and Expectations:

Datadog - Monitor infrastructure health, capture service-level metrics, reduce alert fatigue through high signal thresholds.
PagerDuty - Own incident management pipeline. Route alerts by severity and align with business SLAs.
GKE / Kubernetes - Improve cluster stability and workload isolation. Define auto-scaling configurations and tune for efficiency.
Helm / GitOps (ArgoCD/Flux) - Validate release consistency across clusters. Monitor sync status and rollout safety.
Terraform Cloud - Support DR planning and detect infrastructure changes through state comparisons.
CloudSQL / Cloudflare - Diagnose DB and networking issues. Monitor latency, enforce access patterns, and validate WAF usage.
Secret Management - Audit access to secrets, apply short-lived credentials, and define alerts for abnormal usage..

Seniority level

Seniority level
Not Applicable

Employment type

Employment type
Full-time

Job function

Job function
Information Technology
Industries
Software Development

Referrals increase your chances of interviewing at ConSol Partners by 2x

Get notified about new Site Reliability Engineer jobs in London Area, United Kingdom.

London, England, United Kingdom 2 days ago

London, England, United Kingdom 2 weeks ago

London, England, United Kingdom 1 month ago

London, England, United Kingdom 1 day ago

London, England, United Kingdom 1 week ago

Platform Engineer (Remote within the UK)

London, England, United Kingdom 1 month ago

London, England, United Kingdom 2 weeks ago

London, England, United Kingdom 1 day ago

London, England, United Kingdom 3 days ago

London, England, United Kingdom 1 day ago

Site Reliability Engineer – FinTech / Global Payments – London HQ / Remote First

London, England, United Kingdom 1 month ago

Southend-On-Sea, England, United Kingdom 1 month ago

London, England, United Kingdom 1 week ago

London, England, United Kingdom 1 month ago

London, England, United Kingdom 2 weeks ago

London, England, United Kingdom 1 month ago

London, England, United Kingdom 3 months ago

London, England, United Kingdom 1 week ago

London, England, United Kingdom 2 weeks ago

Hounslow, England, United Kingdom 2 weeks ago

London, England, United Kingdom 5 days ago

London, England, United Kingdom 2 days ago

London, England, United Kingdom 6 months ago

London, England, United Kingdom 3 months ago

Senior Site Reliability / Gitops Engineer

London, England, United Kingdom 2 weeks ago

London, England, United Kingdom 1 month ago

London, England, United Kingdom 4 days ago

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs