2,431

Kubernetes jobs in United States

Platform Site Reliability Engineer at AI infrastructure platform startup

Jack & Jill/External ATS

Greater London

Remote

GBP 70,000 - 90,000

6 days ago

Be an early applicant

I want to receive the latest job alerts for “Kubernetes” jobs

Senior Infrastructure Engineer at Cosine.sh

Jack & Jill/External ATS

Greater London

On-site

GBP 70,000 - 90,000

6 days ago

Be an early applicant

Junior DevOps Engineer

Story Terrace Inc.

Greater London

Hybrid

GBP 45,000

7 days ago

Be an early applicant

Platform Engineer - Engine by Starling

The Engine

Manchester

Hybrid

GBP 55,000 - 75,000

Today

Be an early applicant

Platform Engineer - Engine by Starling

The Engine

Cardiff

Hybrid

GBP 80,000 - 100,000

Today

Be an early applicant

Cloud Engineer/SRE (Azure, Terraform, Kubernetes) - Contract - London, UK (Hybrid)

Cactus IT Solutions UK Ltd

City of Westminster

On-site

GBP 65,000 - 85,000

Today

Be an early applicant

Site Reliability Engineers

CGI Group Inc.

City of Westminster

On-site

GBP 70,000

Today

Be an early applicant

Principal/Senior Lead DevOps Engineer - Cloud Platform

London Stock Exchange Group

Nottingham

On-site

GBP 80,000 - 110,000

Yesterday

Be an early applicant

Connect with headhunters to apply for similar jobs

Senior DevSecOps Engineer

Light

City Of London

On-site

GBP 80,000 - 100,000

Yesterday

Be an early applicant

Senor Golang Ethical Hacker - (SVP)

11037 Citibank, N.A. United Kingdom

Greater London

Hybrid

GBP 60,000 - 80,000

Yesterday

Be an early applicant

Senor Golang Ethical Hacker - (SVP)

Citigroup, Inc.

Greater London

Hybrid

GBP 70,000 - 90,000

Yesterday

Be an early applicant

DevSecOps (Erlang/Elixir + General Software)

IO Associates

Cheltenham

Hybrid

GBP 40,000 - 60,000

Yesterday

Be an early applicant

Senior Cloud Engineer

Skipton Building Society

South Yorkshire

Hybrid

GBP 60,000 - 80,000

Yesterday

Be an early applicant

Principal Engineer.

Games Jobs Direct

Greater London

On-site

GBP 80,000 - 100,000

Yesterday

Be an early applicant

Staff Cloud DevOps Engineer

Entrust

Greater London

On-site

GBP 60,000 - 80,000

Yesterday

Be an early applicant

Matillion Engineer

Peterborough Limited

Oundle

On-site

GBP 70,000 - 90,000

Yesterday

Be an early applicant

Configuration Engineer

The Wave

Hemel Hempstead

Remote

GBP 113,000 - 134,000

Yesterday

Be an early applicant

Technical Lead

IBM

Abbots Worthy

On-site

GBP 70,000 - 90,000

Yesterday

Be an early applicant

Java Software Developer

IBM

Abbots Worthy

On-site

GBP 50,000 - 70,000

Yesterday

Be an early applicant

L2 Linux Engineer Ops Centre - Assoc Manager / Sr Analyst

WeAreTechWomen

Greater London

Hybrid

GBP 50,000 - 75,000

Yesterday

Be an early applicant

Senior Infrastructure Engineer

Push Gaming Limited

City of Westminster

On-site

GBP 80,000 - 100,000

2 days ago

Be an early applicant

Senior Devops Engineer (f/m/d)

think project! GmbH

Reading

Hybrid

GBP 70,000 - 90,000

2 days ago

Be an early applicant

Senor Golang Ethical Hacker - (SVP)

PowerToFly

Greater London

Hybrid

GBP 40,000 - 60,000

2 days ago

Be an early applicant

Technical Lead

Esure Group PLC

Easebourne

Hybrid

GBP 100,000 - 125,000

2 days ago

Be an early applicant

Technical Cyber Security Lead

Genus Plc

Chester

Hybrid

GBP 70,000 - 85,000

2 days ago

Be an early applicant

Platform Site Reliability Engineer at AI infrastructure platform startup

Jack & Jill/External ATS

Remote

GBP 70,000 - 90,000

Full time

6 days ago

Be an early applicant

Job summary

A fast-growing AI infrastructure platform startup is looking for a Platform Site Reliability Engineer to enhance an AI infrastructure platform. The role involves deploying and optimizing Kubernetes for AI workloads, ensuring system stability, performance, and security in a 24/7 production environment. The ideal candidate will have extensive experience in performance-critical environments and strong Linux expertise. This position offers a chance to work at the forefront of AI infrastructure in a well-funded startup.

Qualifications

5+ years’ experience in performance-critical SRE environments with 24/7 operations.
3+ years’ hands-on experience deploying and running orchestration platforms.
Expert-level Linux administration, especially Ubuntu.

Responsibilities

Deploy, operate, and scale Kubernetes clusters for AI-centric workloads.
Optimize Linux systems and build automation for platform lifecycle management.
Maintain observability and reliability in 24/7 production environments.

Skills

Kubernetes expertise

Linux administration

System tuning skills

Networking fundamentals

Tools

Prometheus

Grafana

This is a job that we are recruiting for on behalf of one of our customers.

To apply, speak to Jack. He's an AI agent that sends you unmissable jobs and then helps you ace the interview. He'll make sure you are considered for this role, and help you find others if you ask.

Platform Site Reliability Engineer

Company Description

A fast-growing AI infrastructure platform startup building the backbone for next-generation AI workloads, connecting software and hardware at scale in a highly technical, mission-critical environment.

Job Description

As a Platform Site Reliability Engineer, you will own and evolve a highly available AI infrastructure platform, ensuring stability, security, and performance across bare-metal, virtualization, and orchestration layers. You’ll deploy and optimize Kubernetes for AI workloads, drive automation, manage incidents, and mentor others while supporting a 24/7 production environment.

Location

Gloucestershire, UK

Why this role is remarkable

Work at the forefront of AI infrastructure, bridging hardware and software for cutting-edge AI workloads
Operate and scale complex bare-metal, virtualized, and Kubernetes-based platforms
Make a meaningful impact on reliability, automation, and team capability within a well-funded startup

What you will do

Deploy, operate, and scale Kubernetes clusters supporting AI-centric workloads
Optimize Linux systems and build automation for platform lifecycle management and incident response
Maintain observability and reliability using tools such as Prometheus and Grafana in 24/7 production environments

The ideal candidate

5+ years’ experience in globally scaled, performance-critical SRE environments with 24/7 operations
3+ years’ hands‑on experience deploying and running orchestration platforms, with deep Kubernetes expertise
Expert-level Linux administration (especially Ubuntu), strong system tuning skills, and solid networking fundamentals

How to Apply

To apply for this job speak to Jack, our AI recruiter.

Step 1. Visit our website
Step 2. Click 'Speak with Jack'.
Step 3. Login with your LinkedIn profile.
Step 4. Talk to Jack for 20 minutes so he can understand your experience and ambitions
Step 5. If the hiring manager would like to meet you, Jack will make the introduction

* The salary benchmark is based on the target salaries of market leaders in their relevant sectors. It is intended to serve as a guide to help Premium Members assess open positions and to help in salary negotiations. The salary benchmark is not provided directly by the company, which could be significantly higher or lower.