Job Search and Career Advice Platform

Enable job alerts via email!

Remote Senior Site Reliability Engineer — Scale AI Infrastructure

Prolific - UK Job Board?

United Kingdom

Hybrid

GBP 60,000 - 80,000

Full time

30+ days ago

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading AI infrastructure company in the UK seeks a Site Reliability Engineer to ensure their platform's resilience and performance. This role involves managing Kubernetes clusters, incident response, and collaborating with cross-functional teams. Ideal candidates have significant experience with Google Cloud, observability principles, and strong Python programming skills. Join us for a competitive salary and a remote working culture.

Benefits

Competitive salary
Remote working
Mission-driven culture

Qualifications

  • 5+ years with Google Cloud Platform, Kubernetes, and Terraform experience.
  • Strong programming skills in Python.
  • Experience with observability tools and principles.

Responsibilities

  • Develop and maintain highly available infrastructure using infra-as-code.
  • Manage Kubernetes clusters for reliability and performance.
  • Participate in incident response for production issues.

Skills

Google Cloud Platform
Kubernetes
Python
Terraform
Observability principles
GitOps

Tools

CircleCI
Datadog
Django Rest Framework
MongoDB
AWS
Job description
A leading AI infrastructure company in the UK seeks a Site Reliability Engineer to ensure their platform's resilience and performance. This role involves managing Kubernetes clusters, incident response, and collaborating with cross-functional teams. Ideal candidates have significant experience with Google Cloud, observability principles, and strong Python programming skills. Join us for a competitive salary and a remote working culture.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.