Job Search and Career Advice Platform

Enable job alerts via email!

Platform Engineer

CATCHES

Remote

GBP 80,000 - 100,000

Full time

Yesterday
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A luxury fashion tech company is seeking a Platform Engineer to manage and optimise next-generation high-performance computing infrastructure. Responsibilities include automating GPU cluster provisioning, maintaining Linux environments for AI/ML workloads, and optimizing for containerized workloads. This fully remote position offers a high-trust environment that values innovation and creativity in tech and fashion. Ideal candidates will have a strong background in Linux Systems Administration and experience with IaC tools like Terraform and Ansible.

Benefits

Co-working allowances
Remote-first environment
High-trust culture

Qualifications

  • Strong background in Linux Systems Administration.
  • Experience managing Bare Metal servers (on-premise or packet/equinix metal).
  • Proficiency in Infrastructure as Code (IaC) tools.

Responsibilities

  • Automate provisioning and lifecycle of GPU clusters using Terraform and Ansible.
  • Maintain stability of large-scale Linux environments supporting AI/ML workloads.
  • Collaborate to troubleshoot hardware and networking issues.

Skills

Linux Systems Administration
Infrastructure as Code (IaC) tools
High-throughput networking

Tools

Terraform
Ansible
Prometheus
Grafana
Kubernetes
Docker
Job description
About

Backed by some of the most influential names in luxury fashion globally. We blend advanced 3D rendering, AI and VFX techniques to deliver unparalleled shopping experiences for luxury fashion.

Role

We are hiring a Platform Engineer to manage and optimise our next-generation high-performance computing infrastructure. Move beyond standard cloud instances and manage the raw power of bare metal GPU clusters.

Responsibilities
  • Automate the provisioning and lifecycle of high-performance GPU clusters using Terraform and Ansible.
  • Maintain the stability and performance of large-scale Linux environments supporting AI / ML training workloads.
  • Collaborate with vendors and internal teams to troubleshoot hardware and networking bottlenecks (latency, throughput).
  • Implement monitoring solutions (Prometheus / Grafana) to visualise GPU health and cluster efficiency.
  • Assist in optimising the stack for containerised workloads (Kubernetes / Docker).
Requirements
  • Strong background in Linux Systems Administration.
  • Experience managing Bare Metal servers (on-premise or packet / equinix metal).
  • Proficiency in Infrastructure as Code (IaC) tools.
  • Nice to have : Exposure to GPUs, InfiniBand, or high-throughput networking (we will train the right candidate).
What working with CATCHES is like
  • Fully remote-first, async-friendly, with optional co-working allowances.
  • High-trust, low-bureaucracy environment that values experimentation and shipping.
  • Early influence on product, architecture and engineering culture.
  • Cutting-edge tech, luxury-fashion creativity, and games-industry scale challenges combined.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.