Enable job alerts via email!

ML Infrastructure Engineer

Millennium

London

On-site

GBP 60,000 - 100,000

Full time

30+ days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

Join a forward-thinking company as an ML Infrastructure Engineer, where you'll play a pivotal role in shaping AI/ML infrastructure solutions. This hands-on position involves collaborating with teams to architect, develop, and maintain cutting-edge AI/ML solutions in both cloud and on-premise environments. You'll leverage your expertise in cloud infrastructure, automation tools like Terraform, and AI/ML frameworks to deliver robust solutions that enhance performance and security. If you're passionate about technology and thrive in a dynamic environment, this is the perfect opportunity to make a significant impact in the AI/ML landscape.

Qualifications

  • 6+ years of experience in cloud environments and AI/ML solutions.
  • Strong skills in Terraform and Python for infrastructure as code.

Responsibilities

  • Architect and maintain AI/ML infrastructure components and solutions.
  • Implement CI/CD pipelines and automate systems configuration.

Skills

Cloud Infrastructure
AI/ML Solutions Development
Terraform
Python
Linux Systems Administration
CI/CD Tools
Containerization
Networking
Security
Automated Configuration

Tools

Terraform
Chef
Ansible
Salt
TensorFlow
PyTorch
Ray
Dask
vLLM
KFServing

Job description

ML Infrastructure Engineer

This role is a member of the AI/ML Infrastructure Engineering team and will be dedicated to implementing and supporting AI/ML infrastructure solutions in cloud and on-premise environments. The role will work directly with infrastructure teams and potentially face off with data scientists, machine learning engineers, application developers, and quantitative analysts by functioning as both a solutions architect and a professional services engineer.

This is a hands-on developer role, and candidates ideally have had experience deploying and supporting their own production-ready AI/ML models in cloud environments as well as automating the build and management of a broad range of cloud infrastructure using tools like Terraform. Candidates should be familiar with developing unit and functional tests, have experience designing and implementing CI/CD tools with infrastructure as code pipelines, and have knowledge of Linux systems administration, containerization, networking, security, automated configuration and state management, cross-system orchestration, configuration management, logging, metrics, monitoring, and alerting.

Principal Responsibilities:

  1. Architect, develop and maintain internal AI/ML infrastructure components, frameworks, and offerings
  2. Architect, develop and maintain AI/ML solutions for customers in cloud environments
  3. Help customers architect, develop and maintain their own AI/ML solutions in cloud environments
  4. Implement CI/CD pipelines which include application tests, security tests, and gates
  5. Implement availability, security, performance monitoring, and alerting of AI/ML solutions
  6. Automate data resiliency and replication for AI/ML models
  7. Manage multiple environments and promote code between them
  8. Automate systems configuration and orchestration using tools such as Terraform, Chef, Ansible, or Salt
  9. Automate creation of machine images and containers

Required Qualifications/Skills:

  1. 6+ years of experience designing and supporting production cloud environments
  2. Experience consulting with customers to develop AI/ML solutions
  3. Experience developing collaboratively, including infrastructure as code, preferably in Python
  4. Systems engineering knowledge, including understanding of Linux, security, and networking
  5. Cloud templating tools such as Terraform
  6. Experience with AI/ML frameworks (e.g., TensorFlow, PyTorch)
  7. Experience with distributed computing tools (e.g., Ray, Dask)
  8. Experience with model serving tools (e.g., vLLM, KFServing)
  9. Experience with building, monitoring, and alerting on logs and metrics
  10. Cloud Networking including connectivity, routing, DNS, VPCs, proxies, and load balancers
  11. Cloud Security including IAM, Certificate Management, and Key Management
  12. Excellent written and verbal communication skills
  13. Excellent troubleshooting and analytical skills
  14. Self-starter able to execute independently, on a deadline, and under pressure
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

ML Engineer - Infrastructure

Convergence

London

On-site

GBP 60,000 - 100,000

3 days ago
Be an early applicant

Member of Technical Staff, Agent Infrastructure Engineer

Cohere

London

Remote

GBP 50,000 - 90,000

30+ days ago

Member of Technical Staff, Training Infra Engineer

Cohere

London

Remote

GBP 50,000 - 90,000

30+ days ago

29136 - 3rd Line Infrastructure Support Engineer

TN United Kingdom

Basingstoke

On-site

GBP 80,000 - 100,000

10 days ago

⚙️ Infrastructure Engineer London, UK

Granola inc

London

On-site

GBP 60,000 - 80,000

28 days ago

Senior Software Engineer (Infrastructure)

ZipRecruiter

London

On-site

GBP 70,000 - 90,000

28 days ago

Lead Machine Learning Engineer (Agentic Infrastructure)

JR United Kingdom

Slough

Hybrid

GBP 70,000 - 100,000

12 days ago

Lead Machine Learning Engineer (Agentic Infrastructure)

JR United Kingdom

London

Hybrid

GBP 70,000 - 100,000

20 days ago

PRINCIPAL MACHINE LEARNING INFRASTRUCTURE ENGINEERS-AEROSPACE AND DEFENSE

Gentrian

London

On-site

GBP 70,000 - 120,000

30+ days ago