Job Search and Career Advice Platform

Enable job alerts via email!

Machine Learning Engineer, Platform

AION

Greater London

On-site

GBP 70,000 - 90,000

Full time

Today
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A technology company in Greater London is seeking a hands-on ML engineer to design and optimize large language models (LLMs). The role involves end-to-end pipeline creation for model training and evaluation, fine-tuning LLMs using techniques like LoRA and QLoRA, and implementing monitoring for model quality. The ideal candidate should have 4-6 years of experience in ML engineering, strong skills in data preparation, and a drive to build impactful models. Apply now for an opportunity to join a visionary team transforming high-performance computing.

Qualifications

  • 4-6 years of experience in machine learning engineering.
  • Experience building LLMs and ML deployment pipelines.
  • Strong understanding of ML lifecycle from data to deployed models.

Responsibilities

  • Design and implement LLMOps pipelines for model training and evaluation.
  • Optimize model accuracy and training performance through experimentation.
  • Deploy models with multi-adapter serving and optimizations.

Skills

Building and fine-tuning large language models
Model evaluation and optimization
Data preparation
Experience with LoRA and QLoRA techniques
Experience with RLHF pipelines
Strong ownership and initiative

Tools

HuggingFace Transformers
Unsloth
Axolotl
Job description
About AION

AION is building an interoperable AI cloud platform by transforming the future of high-performance computing (HPC) through its decentralized AI cloud. Purpose-built for bare-metal performance, AION democratizes access to compute and provides managed services, aiming to be an end-to-end AI lifecycle platform—taking organizations from data to deployed models using its forward-deployed engineering approach.

AI is transforming every business around the world, and the demand for compute is surging like never before. AION thrives to be the gateway for dynamic compute workloads by building integration bridges with diverse data centers around the world and re-inventing the compute stack via its state-of-the-art serverless technology. We stand at the crossroads where enterprises are finding it hard to balance AI adoption with security. At AION, we take enterprise security and compliance very seriously and are re‑thinking every piece of infrastructure from hardware and network packets to API interfaces.

Led by high‑pedigree founders with previous exits, AION is well‑funded by major VCs with strategic global partnerships. Headquartered in the US with global presence, the company is building its initial core team in India / UK.

Who You Are

You're a hands‑on ML engineer with 4‑6 years of experience building and fine‑tuning large language models (LLMs) and transformer‑based models. You're execution‑focused and thrive on solving challenging problems at the intersection of machine learning research and production systems.

You're comfortable working across the ML development lifecycle—from data preparation and model fine‑tuning to evaluation and optimization. You understand both what makes a model perform well and how to systematically improve model quality through experimentation. Experience with LLM fine‑tuning (LoRA, QLoRA), RLHF pipelines, and comprehensive model evaluation is highly desirable. You bring strong ownership, initiative, and the drive to build production‑ready ML models that impact thousands of developers globally.

Requirements
What You'll Do
ML Model Development & Optimization
  • Design and implement end‑to‑end LLMOps pipelines for model training, fine‑tuning, and evaluation
  • Fine‑tune and customize LLMs (Llama, Mistral, Gemma, etc.) using full fine‑tuning and PEFT techniques (LoRA, QLoRA) with tools like Unsloth, Axolotl, and HuggingFace Transformers
  • Implement RLHF (Reinforcement Learning from Human Feedback) pipelines for model alignment and preference optimization
  • Design experiments for automated hyperparameter tuning, training strategies, and model selection
  • Prepare and validate training datasets—ensuring data quality, preprocessing, and format correctness
  • Build comprehensive model evaluation systems with custom metrics (BLEU, ROUGE, perplexity, accuracy) and develop synthetic data generation pipelines
  • Optimize model accuracy, token efficiency, and training performance through systematic experimentation
  • Design and maintain prompt engineering workflows with version control systems
  • Deploy models using vLLM with multi‑adapter LoRA serving, hot‑swapping, and basic optimizations (speculative decoding, continuous batching, KV cache management)
ML Operations & Technical Leadership
  • Set up ML‑specific monitoring for model quality, drift detection, and performance tracking with automated retraining triggers
  • Manage model versioning, artifact storage, lineage tracking, and reproducibility using experiment tracking tools
  • Debug production model issues and optimize cost‑performance trade‑offs for training and inference
  • Partner with infrastructure engineers on ML‑specific compute requirements and deployment pipelines
  • Document model development processes and share knowledge through internal tech talks
Technical Skills & Experience

If you are meeting some of these requirements and feel comfortable catching up on others, we definitely recommend you

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.