Enable job alerts via email!

Machine Learning Infrastructure Engineer

ZipRecruiter

City Of London

On-site

GBP 100,000 - 130,000

Full time

Today
Be an early applicant

Job summary

A pioneering AI startup in London seeks an experienced candidate to own the ML infrastructure. The role involves designing scalable cloud architecture and managing GPU clusters. Ideal applicants will have a strong background in building ML systems from scratch and will work closely with researchers to optimize workflows. The salary range is £100k–£130k, flexible for strong profiles.

Qualifications

  • Experience in building ML infrastructure and cloud architecture from scratch.
  • Familiarity with designing and deploying scalable, high-performance cloud infrastructure.
  • Proficiency in setting up and optimizing containerized workflows.

Responsibilities

  • Design and deploy scalable cloud infrastructure for ML workloads.
  • Build and manage GPU clusters and distributed training environments.
  • Implement monitoring, incident response, and CI/CD practices.

Skills

Building ML infrastructure
Cloud architecture
Docker
Kubernetes
Terraform
Python

Tools

AWS
GCP
Azure
MLflow
Prometheus
Grafana
Job description
Overview

Do you want to own the ML infrastructure at a frontier AI startup?

Have you built cloud and ML systems from scratch, not just maintained them?

Are you ready to shape the backbone of 3D generative AI?

SpAItial is pioneering the development of a frontier 3D foundation model, combining cutting-edge AI, computer vision, and spatial computing to redefine how industries — from robotics and AR/VR to gaming and film — generate and interact with 3D content. Backed by £13m in seed funding, with half allocated to compute, SpAItial is a 10-person research-focused team moving fast towards a public demo later this year.

Responsibilities
  • Design and deploy scalable, high-performance cloud infra for ML workloads
  • Build and manage GPU clusters, storage systems, and distributed training environments
  • Set up and optimise containerised workflows (Docker, Kubernetes, Terraform)
  • Implement robust monitoring, incident response, and CI/CD practices
  • Collaborate closely with researchers to integrate and scale experiments

This person must have experience building ML Infrastructure and cloud architecture from scratch

Key Details
  • Salary: £100k–£130k (flexible for strong profiles)
  • Working Model: On-site, London
  • Tech Stack: AWS/GCP/Azure, Kubernetes, Docker, Terraform, Python, MLflow/Prometheus/Grafana

If you want to shape the backbone of one of Europe’s most ambitious AI startups, we’d love to hear from you.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.