
Enable job alerts via email!
Generate a tailored resume in minutes
Land an interview and earn more. Learn more
A leading AI research institute is seeking a Software Engineer to enhance capabilities in evaluating Large Language Models (LLMs). You'll build vital tools for research, ensuring accurate experimental results and improving internal software platforms. Requires experience in production-quality Python, software development, and teamwork in an innovative environment. The role offers competitive UK-based salary, flexible hours, unlimited vacation, and opportunities for professional development.
Applications deadline: Our hiring cycle for 2025 has concluded for now. New applications will be considered from 2026 onwards.
The capabilities of current AI systems are evolving at a rapid pace. While these advancements offer tremendous opportunities they also present significant risks such as the potential for deliberate misuse or the deployment of sophisticated yet misaligned models. At Apollo Research our primary concern lies with deceptive alignment a phenomenon where a model appears to be aligned but is in fact misaligned and capable of evading human oversight.
Our approach focuses on behavioral model evaluations which we then use to audit real-world models. We also combine black-box approaches with applied our evaluations we focus on LM agents i.e. LLMs with agentic scaffolding similar to AIDE or SWE agent. We also study model organisms in controlled environments (see our security policies) e.g. to better understand capabilities related to scheming.
At Apollo we aim for a culture that emphasizes truth-seeking being goal-oriented giving and receiving constructive feedback and being friendly and helpful. If youre interested in more details about what its like working at Apollo you can find more information here.
Were seeking a Software Engineer who will enhance our capability to evaluate Large Language Models (LLMs) through building critical tools and libraries for our Evals team. Your work will directly impact our mission to make AI systems safer and more aligned.
You must have experience writing production-quality python code. We are looking for strong generalist software engineers with a track record of taking ownership. Candidates may demonstrate these skills in different ways. For example you might have one of more of these :
We want to emphasize that people who feel they dont fulfill all of these characteristics but think they would be a good fit for the position nonetheless are strongly encouraged to apply. We believe that excellent candidates can come from a variety of backgrounds and are excited to give you opportunities to shine.
The current evals team consists of Mikita Balesni, Jérémy Scheurer, Alex Meinke, Rusheb Shah, Bronson Schoen, Andrei Matveiakin, Felix Hofstätter and Axel Højmark. Marius Hobbhahn manages and advises the team though team members lead individual projects. You would work closely with Rusheb and Andrei who are the full-time software engineers on the evals team but you would also interact a lot with everyone else. You can find our full team here.
EVALS TEAM WORK. The evals team focuses on the following efforts :
Equality Statement : Apollo Research is an Equal Opportunity Employer. We value diversity and are committed to providing equal opportunities to all regardless of age disability gender reassignment marriage and civil partnership pregnancy and maternity race religion or belief sex or sexual orientation.
How to apply : Please complete the application form with your CV. The provision of a cover letter is optional but not necessary. Please also feel free to share links to relevant work samples.
About the interview process : Our multi-stage process includes a screening interview a take-home test (approx. 2 hours) 3 technical interviews and a final interview with Marius (CEO). The technical interviews will be closely related to tasks the candidate would do on the job. There are no leetcode-style general coding interviews. If you want to prepare for the interviews we suggest working on hands-on LLM evals projects (e.g. as suggested in our starter guide) such as building LM agent evaluations in Inspect.
We are committed to protecting your data ensuring fairness and adhering to workplace fairness principles in our recruitment process. To enhance hiring efficiency we use AI-powered tools to assist with tasks such as resume screening. These tools are designed and deployed in compliance with internationally recognized AI governance frameworks.
Your personal data is handled securely and transparently. We adopt a human-centred approach : all resumes are screened by a human and final hiring decisions are made by our team. If you have questions about how your data is processed or wish to report concerns about fairness please contact us.
Thank you very much for applying to Apollo Research.
Spring,.NET,C / C++,Go,React,OOP,C#,Data Structures,JavaScript,Software Development,Java,Distributed Systems
Employment Type : Full-Time
Experience : years
Vacancy : 1