Enable job alerts via email!

Site Reliability Engineer

Orgvue Limited

London

Hybrid

GBP 70,000 - 110,000

Full time

Yesterday
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An innovative company is seeking a Principal Site Reliability Engineer to lead the scaling of their AWS and Kubernetes infrastructure. This pivotal role combines technical expertise with strategic vision, fostering a culture of reliability and resilience. You will collaborate with cross-functional teams to enhance operational excellence, mentor engineers, and drive Infrastructure as Code initiatives. With a focus on observability and automation, you will play a critical role in ensuring the systems are robust and adaptable in a fast-paced environment. Join a forward-thinking organization that values individualism and diversity, and make a significant impact on their engineering foundation.

Benefits

Hybrid working
Wellbeing initiatives
Subsidised gym membership
Private medical insurance
25 days holiday
Summer Fridays
Employer pension contribution
Season ticket loan
Cycle to Work Scheme
Annual discretionary bonus

Qualifications

  • Strong experience with AWS services and Kubernetes in production.
  • Hands-on expertise in Infrastructure as Code and observability practices.

Responsibilities

  • Define SLOs and enhance SRE practices across the organization.
  • Develop cloud infrastructure strategies and implement observability metrics.

Skills

SRE transformations
Kubernetes (EKS preferred)
AWS core services
Infrastructure as Code (Terraform)
Observability practices
Automation and CI/CD
Incident management

Tools

Terraform
CloudFormation
GitOps

Job description

Orgvue is an organisational design and planning platform that empowers your business to transform its workforce by understanding the work people do and the skills they have. Our platform connects strategy to structure, providing clarity of vision, so you can build a more adaptable, better performing organisation that thrives in a constantly changing world of work.

The world’s largest and best-known enterprises and consulting firms use Orgvue to visualise and model current and future states of the organisation and make faster, more informed decisions. The company is headquartered in London, with offices in Philadelphia, The Hague, Toronto, and Sydney.

Role: Principal Site Reliability Engineer

You will be a senior technical leader focused on scaling and hardening our AWS- and Kubernetes-based infrastructure. You will collaborate across product, platform, and operations teams to ensure our systems are reliable, observable, and resilient — even at scale.

This role combines hands-on technical skills with strategic vision, helping us build a world-class reliability culture and a robust engineering foundation for growth. We seek someone with technical expertise, excellent communication skills, and a collaborative spirit.

Responsibilities:
  1. Define and enforce SLOs, SLIs, and error budgets across critical services
  2. Develop and implement cloud infrastructure and tooling strategies
  3. Enhance SRE practices across the organization
  4. Implement robust observability metrics, logs, and traces using our observability tools
  5. Guide the team in building automated, self-healing systems
  6. Own and evolve incident response processes, including on-call practices and post-mortem culture
  7. Mentor engineers on reliability, operational readiness, and scalable infrastructure best practices
  8. Drive Infrastructure as Code (IaC) initiatives using Terraform, Kubernetes, CloudFormation, and GitOps practices
  9. Collaborate with security, DevOps, and software teams to ensure compliance and operational excellence
  10. Evaluate and adopt tools and practices to improve platform performance and reliability
Desired Skills & Experience:
  1. Experience leading SRE transformations
  2. Hands-on expertise with Kubernetes (EKS preferred) in production
  3. Strong experience with AWS core services (EC2, EKS, RDS, S3, ALB/NLB, IAM, CloudWatch, etc.)
  4. Proficiency in Infrastructure as Code using Terraform and knowledge of GitOps workflows
  5. Strong background in observability: metrics, visualization, logging, tracing
  6. Understanding of automation, CI/CD pipelines, deployment automation, and release strategies
  7. Experience with incident management, disaster recovery, root cause analysis, and post-incident reviews
Additional Benefits:
  • Hybrid working: 1+ days a week in London office
  • Wellbeing initiatives: coaching, fitness sessions, webinars, Wellbeing day
  • Subsidised gym membership
  • Private medical insurance, dental, vision, and life assurance
  • 25 days holiday (increasing to 30)
  • Summer Fridays (half-days in July and August)
  • Employer pension contribution of 5% (if you contribute at least 3%)
  • Season ticket loan
  • Cycle to Work Scheme
  • Annual discretionary bonus

Here at Orgvue, we promote individualism and a diverse workforce to build our future success.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Senior Site Reliability Engineer

Auros

Greater London

Remote

GBP 60,000 - 100,000

9 days ago

Site Reliability Engineer – FinTech / Global Payments – London HQ / Remote First

Future Talent Group

Greater London

Remote

GBP 50,000 - 90,000

12 days ago

Site Reliability Engineer, Americas

TN United Kingdom

London

Remote

GBP 55,000 - 90,000

12 days ago

Remote Site Reliability Engineer

TN United Kingdom

London

Remote

GBP 60,000 - 100,000

12 days ago

Site Reliability Engineer – FinTech / Global Payments – London HQ / Remote First

JR United Kingdom

London

Remote

GBP 60,000 - 95,000

10 days ago

Site Reliability Engineer

ZipRecruiter

Chelmsford

Remote

GBP 60,000 - 100,000

3 days ago
Be an early applicant

Site Reliability Engineer

Eligo Recruitment

Greater London

Remote

GBP 80,000 - 95,000

7 days ago
Be an early applicant

Site Reliability Engineer, EMEA

TN United Kingdom

London

Remote

GBP 50,000 - 90,000

12 days ago

Reliability, Engineer

Jones Lang LaSalle Incorporated

London

Remote

GBP 50,000 - 90,000

13 days ago