Enable job alerts via email!

Site Reliability Engineer- SC Cleared

Cognizant

City of Westminster

Hybrid

GBP 60,000 - 90,000

Full time

Yesterday

Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading IT & Cloud services company in the UK seeks a Site Reliability Engineer to design and implement CI/CD pipelines and manage cloud resources using Terraform. The role involves operating Kubernetes platforms, ensuring observability, and responding to incidents. Ideal candidates will have experience with major cloud services and strong skills in CI/CD methodologies, particularly with GitHub Actions and Jenkins. Opportunities for growth and learning are abundant in this collaborative environment.

Benefits

Opportunities for career development

Innovative work environment

Qualifications

Proven experience operating production systems on a major cloud provider.
Hands-on experience with Infrastructure as Code using Terraform.
Strong skills in CI/CD tools like GitHub Actions and Jenkins.
Knowledge of Kubernetes and container orchestration.
Experience with observability tools like New Relic and Grafana.

Responsibilities

Design and implement CI/CD pipelines.
Model cloud resources as code using Terraform.
Operate Kubernetes and manage container platforms.
Implement observability practices for infrastructure.
Respond to incidents and participate in on-call rotations.

Skills

Cloud services fundamentals

Terraform

CI/CD with GitHub Actions and Jenkins

Kubernetes administration

New Relic and Grafana

Python

Bash

Tools

Docker

AWS

Azure

GCP

Overview

Excellent opportunity for Site Reliability Engineer to be part of our Cloud Infrastructure & Security services practice. Cognizant Infrastructure Services – Provides IT infrastructure & Cloud services for clients across industry verticals, including both Consulting/Professional and Managed Services, across Enterprise Computing, Cloud services, Security Services, DevOps, Data Centres, End User Computing, Service Desk, Network Services and Environment Management Services.

Key Responsibilities

Build CI/CD you can trust: Design, implement, and operate pipelines in GitHub Actions and Jenkins that deliver zero‑touch, repeatable releases with quality gates, automated tests, and policy‑as‑code controls. Containerise services with Docker and standardise build images.
Provision everything as code: Model cloud resources using Terraform (workspaces, modules, registries, drift detection), enabling composable, reviewed changes across environments.
Run scalable compute: Stand up and operate container platforms — Kubernetes (incl. EKS, AKS, GKE), ECS, and Azure Container Instances (ACI) — including cluster lifecycle, node pools, autoscaling, ingress, service mesh, secrets, and backup/restore.
Observability : Instrument services and infra with New Relic, Grafana (incl. Loki/Tempo where applicable) and cloud‑native telemetry. Define SLIs/SLOs, build actionable dashboards, alerts, and runbooks that drive fast MTTR.
Engineer for reliability & cost: Apply SRE practices (error budgets, change management, resilience testing), right‑size resources, and use cloud provider tooling for security/cost posture.
Incident response & on‑call: Participate in a fair, documented on‑call rota; lead and/or contribute to incident handling, comms, post‑incident reviews, and corrective actions.
Security & compliance by design: Embed IAM least‑privilege, secrets management, image/provenance scanning, and guardrails into pipelines and Terraform modules.

Key Skills and Experience

Proven experience operating production systems on a major cloud (AWS/Azure/GCP) with solid cloud fundamentals (networking, IAM, storage, compute, HA/DR).
Hands‑on IaC with Terraform (modules, state, CI validation, policy checks).
Strong CI/CD skills in GitHub Actions and/or Jenkins (runners/agents, reusable workflows, secrets, matrix builds, artefact management).
Containers & orchestration: Kubernetes administration knowledge (controllers, scheduling, ingress, autoscaling, troubleshooting) and experience with EKS/AKS/GKE and/or ECS/ACI.
Observability: Practical use of New Relic and Grafana to define metrics/traces/logs, tune alerts, and drive SLOs.
Scripting & automation: Proficiency in Python and Bash; experience with boto3 or equivalent SDKs.
Incident management: Exposure to production incidents, on‑call participation, and post‑incident review practices.

Clear communication, stakeholder partnership, and a bias to automate, document, and simplify.

At Cognizant you will experience an exciting mix of innovation by design, creativity, collaboration, and efficiency within a framework of stimulating objectives and a passion for delivering the best to our customers.

You will be joining a network of some of the most creative, innovative, and dedicated people in the industry with ample opportunities to learn and develop your career.

Our Associates are chosen for their attitude, skills, knowledge, and enthusiasm but above all, their belief that anything is possible.

Cognizant is an equal opportunities employer, and we welcome all applications regardless of race, colour, gender, ethnic origin, nationality, religion or beliefs, disability, age, sexual orientation, political opinions, or trade union membership.

Cog25

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Top cities

Top companies

Popular jobs