Enable job alerts via email!

Senior Site Reliability Engineer

Embarcaderomediagroup

Manchester

Hybrid

GBP 70,000 - 80,000

Full time

8 days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An established industry player is seeking a Senior Site Reliability & Platform Engineer to enhance infrastructure and developer experience. This role involves designing and operating Azure-based platforms, implementing SRE principles, and driving automation through Infrastructure as Code. The ideal candidate will have strong cloud engineering skills, a passion for problem-solving, and a commitment to continuous improvement. Join a collaborative team that values inclusivity and innovation, and help shape the future of digital services while enjoying a flexible working environment and competitive benefits.

Benefits

25 Days Holiday

Health Shield

Personal Development Budget

Enhanced Family Leave

Life Assurance

Pension Contributions

Flexible Working Hours

Qualifications

Strong platform and cloud engineering experience.
Proficient in Azure services and Infrastructure as Code.
Experience with CI/CD, GitOps, and automation tools.

Responsibilities

Design and operate reliable Azure-based platforms.
Apply SRE principles for service reliability.
Enhance CI/CD pipelines with security and delivery.

Skills

Azure Knowledge

Infrastructure as Code (Terraform)

CI/CD Pipelines

Observability Tools (Datadog, ELK)

Networking (TCP/IP, Load Balancing)

DevSecOps Practices

FinOps Practices

Problem-Solving Skills

Tools

Terraform

Azure DevOps

Datadog

Grafana

Kubernetes

Senior Site Reliability & Platform Engineer

Manchester | Hybrid/Flexible Working | Full-Time

Drive better infrastructure and developer experience at scale

At Sorted, we're building robust, scalable systems to support modern digital services — and we're looking for a Site Reliability & Platform Engineer to help lead the way.

You'll sit at the heart of our engineering operations, bringing together SRE principles and modern platform engineering practices. This includes combining principles of SRE — such as service-level reliability, observability, incident response — with platform engineering practices like GitOps, Infrastructure as Code, DevSecOps automation, and self-service enablement, to help development teams ship faster, safer, and more cost-efficiently.

What you’ll be doing:

Designing and operating highly reliable, scalable, and secure Azure-based platforms
Applying SRE principles like SLOs, observability, and incident management to drive service reliability
Building Infrastructure as Code using Terraform (v1.7+) and GitOps workflows
Enabling teams through platform tools, reusable Terraform modules, and self-service infrastructure
Enhancing CI/CD pipelines (Azure DevOps, YAML-based) with security scanning and progressive delivery
Supporting AKS clusters and Azure services (SQL, Cosmos DB, ADF, Functions, Logic Apps, etc.)
Improving monitoring and alerting with Datadog, Grafana, ELK, and proactive failure detection
Participating in the on-call rota and leading incident response workflows and blameless postmortems
Coaching engineers, upskilling teams, and contributing to a culture of continuous improvement
Driving cost awareness through FinOps practices and automated budget controls

What we’re looking for:

We're seeking someone with strong platform and cloud engineering experience who can collaborate across teams and incorporate reliability thinking into all aspects of their work. Ideally, you have:

In-depth Azure knowledge (AKS, Functions, SQL, Cosmos DB, etc.)
Strong Infrastructure as Code skills with Terraform (v1.7+)
Experience with CI/CD pipelines, GitOps, and automation tools (PowerShell, Bash)
Familiarity with observability and incident tools like Datadog, ELK, and synthetic monitoring
Solid understanding of networking (TCP/IP, Load Balancing, DNS, Routing)
Good knowledge of DevSecOps practices — including security scanning, IAM, and RBAC
Experience with FinOps — tagging, budgeting, cost optimisation
Experience with Windows and Linux Operating Systems
Understanding of progressive delivery methods (canary, blue/green)
Familiarity with security scanning tools (Trivy, tfsec) integrated into pipelines
A proactive approach to problem-solving, documentation, and coaching

Additional bonus skills include experience with Azure governance tools, advanced Datadog capabilities, Kubernetes autoscaling solutions, GitOps workflows, automated cost dashboards, compliance frameworks, and internal platform development.

What You Can Expect:

Competitive salary: £70,000 - £80,000 depending on experience
25 days holiday plus bank holidays
Flexible remote/hybrid working with office collaboration as needed
Health Shield from day one
Annual £200 personal development budget
Enhanced family leave policy
Life Assurance coverage
Pension contributions via salary sacrifice
35-hour workweek (plus 1-hour unpaid lunch)

Who this role suits:

This is a great opportunity for someone passionate about building robust infrastructure and enabling others to move faster and more securely. You might come from a cloud engineering, SRE, or DevOps background — what matters most is your curiosity, systems thinking, and drive to improve operational efficiency.

At Sorted, we are committed to fostering an inclusive environment where people from all backgrounds can thrive. If you need any accommodations during the interview process, please let us know—we're happy to assist.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs