Job Search and Career Advice Platform

Enable job alerts via email!

Remote Fault-Tolerant LLM Pre-Training Engineer

poolside

Remote

GBP 60,000 - 100,000

Full time

29 days ago

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading AI technology company in the UK seeks a skilled engineer to join their pre-training team. This fully remote role requires strong programming skills, especially in Python and C/C++, as well as a solid understanding of Large Language Models. Major responsibilities include troubleshooting hardware during training and developing tools for recovery. Competitive benefits include flexible hours and a focus on team wellbeing.

Benefits

Fully remote work & flexible hours
37 days/year of vacation & holidays
Health insurance allowance for you and dependents
Company-provided equipment
Wellbeing and learning allowances
Frequent team get-togethers
Diverse & inclusive culture

Qualifications

  • Strong engineering skills required.
  • Basic understanding of LLM training and inference principles.
  • Experience with algorithmic skills and code quality.

Responsibilities

  • Identify and troubleshoot hardware problems during training.
  • Minimize GPU idle time during faults.
  • Design and develop tools to accelerate training recovery.
  • Improve performance and reliability of checkpointing.
  • Write high-quality code in Python, C/C++, and CUDA.

Skills

Understanding of Large Language Models (LLM)
Strong engineering background
Programming experience
Distributed systems

Tools

Linux API
Python with numpy, PyTorch, or Jax
C/C++
NCCL
K8s stack
Job description
A leading AI technology company in the UK seeks a skilled engineer to join their pre-training team. This fully remote role requires strong programming skills, especially in Python and C/C++, as well as a solid understanding of Large Language Models. Major responsibilities include troubleshooting hardware during training and developing tools for recovery. Competitive benefits include flexible hours and a focus on team wellbeing.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.