Mindrift
IT-Systemhaus der Bundesagentur für Arbeit
IT-Systemhaus der Bundesagentur für Arbeit
consider it
IT-Systemhaus der Bundesagentur für Arbeit
JPMorganChase
Connect with headhunters to apply for similar jobsSiemens Mobility
GFZ Helmholtz-Zentrum für Geoforschung
Werusys Industrieinformatik GmbH & Co. KG
Helmholtz-Zentrum Dresden-Rossendorf - HZDR - Helmholtz Association
Max-Planck-Gesellschaft zur Förderung der Wissenschaften e.V.
RWTH Aachen
SEARCH4 Global - POOL YOUR TALENT
]init[ AG für digitale Kommunikation
Technichal University Berlin
BWI GmbH
mindcurv
Helmholtz Association of German Research Centres
A leading AI development firm is seeking a candidate to design realistic evaluation scenarios for LLM-based agents. As part of this fully remote freelance role, you will create and optimize test cases that simulate human tasks, ensuring clarity and effectiveness. Ideal candidates should possess a degree in a relevant field and have strong analytical skills. Flexible working hours and competitive rates await the right individual to contribute to innovative AI projects.
This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English.
At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI.
The Mindrift platform, launched and powered by Toloka, connects domain experts with cutting-edge AI projects from innovative tech clients. Our mission is to unlock the potential of GenAI by tapping into real-world expertise from across the globe.
We're looking for someone who can design realistic and structured evaluation scenarios for LLM-based agents. You'll create test cases that simulate human-performed tasks and define gold-standard behavior to compare agent actions against. You'll work to ensure each scenario is clearly defined, well-scored, and easy to execute and reuse. You'll need a sharp analytical mindset, attention to detail, and an interest in how AI agents make decisions.
Although every project is unique, you might typically:
Simply apply to this post, qualify, and get the chance to contribute to projects aligned with your skills, on your own schedule. From creating training prompts to refining model responses, you'll help shape the future of AI while ensuring technology benefits everyone.
* The salary benchmark is based on the target salaries of market leaders in their relevant sectors. It is intended to serve as a guide to help Premium Members assess open positions and to help in salary negotiations. The salary benchmark is not provided directly by the company, which could be significantly higher or lower.