
Enable job alerts via email!
Generate a tailored resume in minutes
Land an interview and earn more. Learn more
A technology firm in London seeks a Software Engineer to enhance capabilities in evaluating Large Language Models. The role involves building critical tools and libraries, leading projects, and collaborating with researchers to ensure reliable experimental outcomes. Candidates should have 5+ years in software engineering and solid Python skills. This is a full-time position requiring in-person presence, with a focus on shaping the internal software platform.
Applications deadline: Our hiring cycle for 2025 has concluded for now. New applications will be considered from 2026 onwards.
The capabilities of current AI systems are evolving at a rapid pace. While these advancements offer tremendous opportunities, they also present significant risks, such as the potential for deliberate misuse or the deployment of sophisticated yet misaligned models. At Apollo Research, our primary concern lies with deceptive alignment, a phenomenon where a model appears to be aligned but is, in fact, misaligned and capable of evading human oversight.
Our approach focuses on behavioral model evaluations, which we then use to audit real‑world models. We also combine black‑box approaches with applied interpretability. In our evaluations, we focus on LM agents, i.e. LLMs with agentic scaffolding similar to AIDE or SWE agent. We also study model organisms in controlled environments, e.g. to better understand capabilities related to scheming.
At Apollo, we aim for a culture that emphasizes truth‑seeking, being goal‑oriented, giving and receiving constructive feedback, and being friendly and helpful. If you’re interested in more details about what it’s like working at Apollo, you can find more information here.
We're seeking a Software Engineer who will enhance our capability to evaluate Large Language Models (LLMs) through building critical tools and libraries for our Evals team. Your work will directly impact our mission to make AI systems safer and more aligned.
You must have experience writing production‑quality Python code. We are looking for strong generalist software engineers with a track record of taking ownership. Candidates may demonstrate these skills in different ways. For example, you might have one of more of these:
The following experience would be a bonus:
We want to emphasize that people who feel they don’t fulfill all of these characteristics but think they would be a good fit for the position nonetheless are strongly encouraged to apply. We believe that excellent candidates can come from a variety of backgrounds and are excited to give you opportunities to shine.
The current evals team consists of Mikita Balesni, Jérémy Scheurer, Alex Meinke, Rusheb Shah, Bronson Schoen, Andrei Matveiakin, Felix Hofstätter, and Axel Højmark. Marius Hobbhahn manages and advises the team, though team members lead individual projects. You would work closely with Rusheb and Andrei, who are the full‑time software engineers on the evals team, but you would also interact a lot with everyone else. You can find our full team here.
Apollo Research is an Equal Opportunity Employer. We value diversity and are committed to providing equal opportunities to all, regardless of age, disability, gender reassignment, marriage and civil partnership, pregnancy and maternity, race, religion or belief, sex, or sexual orientation.
Please complete the application form with your CV. The provision of a cover letter is optional but not necessary. Please also feel free to share links to relevant work samples.
Our multi‑stage process includes a screening interview, a take‑home test (approx. 2 hours), three technical interviews, and a final interview with Marius (CEO). The technical interviews will be closely related to tasks the candidate would do on the job. There are no leetcode‑style general coding interviews. If you want to prepare for the interviews, we suggest working on hands‑on LLM evals projects (e.g. as suggested in our starter guide), such as building LM agent evaluations in Inspect.
We are committed to protecting your data, ensuring fairness, and adhering to workplace fairness principles in our recruitment process. To enhance hiring efficiency, we use AI‑powered tools to assist with tasks such as resume screening. These tools are designed and deployed in compliance with internationally recognized AI governance frameworks.
Your personal data is handled securely and transparently. We adopt a human‑centred approach: all resumes are screened by a human and final hiring decisions are made by our team. If you have questions about how your data is processed or wish to report concerns about fairness, please contact us atinfo@apolloresearch.ai.
Thank you very much for applying to Apollo Research.