
Enable job alerts via email!
Generate a tailored resume in minutes
Land an interview and earn more. Learn more
A leading research organization in London seeks a Research Scientist for evaluating AI methods, emphasizing long-horizon agents and inference-compute scaling. Candidates with a PhD and strong ML experience are encouraged to apply. This role offers the opportunity to work on impactful projects, collaborate with domain experts, and ensure rigorous evaluation methodologies. Competitive salary and benefits, including flexible hybrid working options and extensive professional development support are provided.
London, UK
The AI Security Institute is the world's largest and best-funded team dedicated to understanding advanced AI risks and translating that knowledge into action. We’re in the heart of the UK government with direct lines to No. 10 (the Prime Minister's office), and we work with frontier developers and governments globally.
We’re here because governments are critical for advanced AI going well, and UK AISI is uniquely positioned to mobilise them. With our resources, unique agility and international influence, this is the best place to shape both AI development and government action.
The deadline for applying to this role is February 22 2026, end of day, anywhere on Earth.
AISI's Science of Evaluation team develops rigorous techniques for measuring and forecasting AI capabilities, ensuring evaluation results are robust, meaningful, and useful for governance.
Evaluations underpin both scientific understanding and policy decisions about frontier AI. Yet current methodologies are poorly equipped to surface what matters most: underlying capabilities, dangerous failure modes, forecasts of future performance, and robustness across settings. We address this gap by stress-testing the claims and methods in AISI’s testing reports, improving evaluation methods, and building new analytical tools. Our research is problem-driven, methodologically grounded, and focused on impact. We aim to improve epistemic rigour and increase confidence in the claims drawn from evaluation data.
(1) Methodological red teaming: Independently auditing evidence and claims in evaluation reports shared with model developers.
(2) Consulting partnerships: Collaborating with AISI evaluation teams to improve methodologies and practices.
(3) Targeted research bets: Pursuing foundational work that enables new insights into model capabilities.
New research agenda focus (in addition to core team responsibilities):
Frontier agents increasingly use massive inference budgets on complex, long-horizon tasks. This makes measuring model horizons, estimating performance ceilings, and maintaining research velocity harder and more expensive. We're developing evaluation methods that remain informative as task budgets exceed 10M+ tokens per attempt and model horizons surpass the longest available tasks.
This research scientist role focuses on evaluation methods for frontier AI, with emphasis on long-horizon agents and inference-compute scaling.
You’ll design and conduct experiments that extract deeper signal from evaluation data, uncovering underlying capabilities. You’ll collaborate with engineers and domain experts across AISI and with external partners. Researchers on this team have substantial autonomy to shape independent agendas, and push the frontier of what evaluations can reveal.
We're flexible on exact background and expect successful candidates to meet many (but not necessarily all) criteria below. Depending on experience, we'll consider candidates at Research Scientist or Senior Research Scientist level. We also welcome applications from earlier-career researchers (2–3 years of hands-on LLM experience) who demonstrate creative and rigorous empirical instincts.
*These benefits apply to direct employees. Benefits may differ for individuals joining through other employment arrangements such as secondments.
Annual salary is benchmarked to role scope and relevant experience. Most offers land between £65,000 and £145,000 made up of a base salary plus a technical allowance (take-home salary = base + technical allowance). An additional 28.97% employer pension contribution is paid on the base salary.
This role sits outside of the DDaT pay framework given the scope of this role requires in depth technical expertise in frontier AI safety, robustness and advanced AI architectures.
In accordance with the Civil Service Commission rules, the following list contains all selection criteria for the interview process.
The interview process may vary candidate to candidate, however, you should expect a typical process to include some technical proficiency tests, discussions with a cross-section of our team at AISI (including non-technical staff), conversations with your workstream lead. The process will culminate in a conversation with members of the senior team here at AISI.
Candidates should expect to go through some or all of the following stages once an application has been submitted: