Job Search and Career Advice Platform

Enable job alerts via email!

Applied Scientist - Post-training

Trades Workforce Solutions

United Kingdom

On-site

GBP 149,000 - 188,000

Full time

Today
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading firm in AI research seeks researchers or recent PhDs to enhance large language model post-training methods. The role involves advancing alignment and instruction-tuning approaches, designing evaluation frameworks, and publishing relevant research. Ideal candidates will have a track record in rigorous work and a genuine curiosity about LLM behaviors. This full-time position is based in South Bay, offering competitive compensation of $200,000 – $250,000 base along with equity and bonuses.

Qualifications

  • Experience in alignment or optimisation of LLMs.
  • A track record of published research or open-source projects.
  • Ability to work autonomously in a collaborative setup.

Responsibilities

  • Develop post-training methods for LLMs.
  • Advance instruction-tuning and preference-learning pipelines.
  • Design evaluation frameworks that measure human-centered behavior.

Skills

Experience in LLM post-training
Research skills
Curiosity about model behavior
Technical depth

Education

PhD in a relevant field
Job description

How do you make a large language model genuinely human‑centred, capable of reasoning, empathy, and nuance rather than just pattern‑matching?

This team is built to answer that question. They’re a small, focused group of researchers and engineers working on the post‑training challenges that matter most: RLHF, RLAIF, continual learning, multilingual behaviour, and evaluation frameworks designed for natural, reliable interaction.

You’ll work alongside a team from NVIDIA, Meta, Microsoft, Apple, and Stanford, in an environment that combines academic rigour with production‑level delivery. Backed by over $400 million in funding, they have the freedom, compute, and scale to run experiments that push beyond the limits of standard alignment research.

This is a role where your work moves directly into deployed products. The team’s models are live, meaning every insight you develop, every method you refine, and every experiment you run has immediate, measurable impact on how large‑scale conversational systems behave.

What you’ll work on
  • Developing post‑training methods that improve alignment, reasoning, and reliability
  • Advancing instruction‑tuning, RLHF/RLAIF, and preference‑learning pipelines for deployed systems
  • Designing evaluation frameworks that measure human‑centred behaviour, not just accuracy
  • Exploring continual learning and multilingual generalisation for long‑lived models
  • Publishing and collaborating on research that informs real‑world deployment
Who this role suits
  • Researchers or recent PhDs with experience in LLM post‑training, alignment, or optimisation
  • A track record of rigorous work – published papers, open‑source projects, or deployed research
  • Curiosity about how large models learn and behave over time, and how to steer that behaviour safely
  • Someone who values autonomy, clarity of purpose, and research that turns into impact

You’ll find a culture driven by technical depth rather than hype – where thoughtful research is backed by meaningful compute and where the best ideas scale fast.

Location: South Bay (on‑site, collaborative setup)
Compensation: $200,000 – $250,000 base + equity + bonus

If you’re ready to work on post‑training research that shapes how large language models behave, we’d love to hear from you.

All applicants will receive a response.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.