Job Search and Career Advice Platform

Enable job alerts via email!

SRE

Ocho People

Belfast

On-site

GBP 60,000 - 80,000

Full time

3 days ago
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A global technology consultancy in the UK is seeking a Site Reliability Engineer to ensure cloud-based systems' availability and performance. The role includes deploying and operating Kubernetes clusters, enhancing system reliability, and collaborating with engineering teams. Ideal candidates will have over 5 years of experience in supporting complex distributed systems, strong Kubernetes skills, and hands-on expertise with observability tools. Apply now if you meet these criteria.

Qualifications

  • 5+ years of experience as an SRE or similar role supporting complex distributed systems.
  • Strong Kubernetes expertise (AKS, EKS, GKE, or similar).
  • Hands-on experience with observability tools such as Prometheus, Grafana, Kibana.

Responsibilities

  • Deploy and improve Kubernetes clusters across multiple cloud environments.
  • Design processes to enhance system reliability, availability, and scalability.
  • Build and optimize CI/CD pipelines for safe deployments.
  • Own monitoring, alerting, and incident response to minimize downtime.
  • Lead post-incident reviews and implement preventative improvements.

Skills

Kubernetes
Observability tools (Prometheus, Grafana, etc.)
Cloud platforms (AWS, Azure, GCP)
SQL databases
Python
Linux expertise
Communication skills
Job description
Site Reliability Engineer

We're working with a global technology consultancy that designs, builds, and supports modern software platforms for enterprise customers worldwide. They partner closely with clients to deliver reliable, scalable, cloud-native solutions.

The Role

As an SRE, you'll play a key role in ensuring the availability, performance, and scalability of production systems, supporting customers across the EMEA region. Helping to build, mature, and enhance the SRE function. This is a hands‑on, technical role, focused on reliability, automation, and operational excellence across a distributed, cloud-based platform.

Key Responsibilities
  • Platform Reliability: Deploy, operate, and improve Kubernetes clusters across multiple cloud environments.
  • Service Performance: Design and implement processes to enhance system reliability, availability, and scalability.
  • CI/CD Enablement: Build and optimise CI/CD pipelines to support safe, repeatable deployments.
  • Observability & Incidents: Own monitoring, alerting, and incident response to minimise downtime and speed recovery.
  • Root Cause Analysis: Lead post‑incident reviews and implement long‑term preventative improvements.
  • Automation: Reduce operational toil through automation and performance optimisation.
  • On‑Call: Participate in weekday coverage and a once‑monthly weekend rota.
Collaboration & Stakeholder Engagement
  • Work closely with engineering, infrastructure, and product teams to embed SRE best practices.
  • Advocate for reliability, resilience, and operational excellence across teams.
  • Collaborate with a globally distributed engineering function.
  • Engage directly with customers to resolve incidents and improve user experience.
Skills & Experience
  • Proven experience as an SRE or similar role, supporting complex distributed systems (5+ years).
  • Strong Kubernetes experience (AKS, EKS, GKE, or similar).
  • Hands‑on with observability tools such as Prometheus, Grafana, Kibana, Vector, or Superset.
  • Experience with at least one major cloud platform: AWS, Azure, GCP, or Linode.
  • SQL database experience (PostgreSQL beneficial but not essential).
  • Proficiency in Python, Go, or Rust.
  • Strong Linux expertise, including performance tuning and troubleshooting.
  • Excellent communication skills, able to work effectively with engineers and customers.

Please apply now if you are meeting the above criteria, or contact Andrew Harrison directly.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.