Job Search and Career Advice Platform

Aktiviere Job-Benachrichtigungen per E-Mail!

Freelance Agent Evaluation Engineer

Mindrift

Remote

EUR 60.000 - 80.000

Teilzeit

Heute
Sei unter den ersten Bewerbenden

Erstelle in nur wenigen Minuten einen maßgeschneiderten Lebenslauf

Überzeuge Recruiter und verdiene mehr Geld. Mehr erfahren

Zusammenfassung

A leading tech consultancy in Berlin is seeking software engineers for project-based AI roles. Contributors will create structured test cases, define evaluation criteria, and analyze agent behavior. The ideal candidate has over 3 years of experience in software development with a strong focus on Python, familiarity with Git, and understands LLM limitations. This opportunity offers flexible working hours with payment rates up to $50/hour based on the complexity of tasks and expertise.

Qualifikationen

  • 3+ years of software development experience, ideally with a strong Python focus.
  • Experience using Git and code repositories.
  • Familiarity with structured formats like JSON or YAML.

Aufgaben

  • Create structured test cases for complex workflows.
  • Define gold-standard behavior to evaluate agent actions.
  • Analyze logs and decision paths for improvements.

Kenntnisse

Python programming
Git
JSON/YAML
Understanding LLM limitations
Docker
B2 English proficiency
Jobbeschreibung
Stellenbeschreibung

Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, evaluating, and improving AI systems. Participation is project-based, not permanent employment.

What this opportunity involves

While each project involves unique tasks, contributors may:

  • Create structured test cases that simulate complex human workflows
  • Define gold-standard behavior and scoring logic to evaluate agent actions
  • Analyze agent logs, failure modes, and decision paths
  • Work with code repositories and test frameworks to validate your scenarios
  • Iterate on prompts, instructions, and test cases to improve clarity and difficulty
  • Ensure that scenarios are production-ready, easy to run, and reusable
What we look for

This opportunity is a good fit for software engineers, open to part-time, non-permanent projects. Ideally, contributors will have:

  • 3+ years of software development experience with strong Python focus
  • Experience with Git and code repositories
  • Comfortable with structured formats like JSON/YAML for scenario description
  • Understanding core LLM limitations (hallucinations, bias, context limits) and how these affect evaluation design
  • Familiarity with Docker
  • English proficiency - B2
How it works

Apply → Pass qualification(s) → Join a project → Complete tasks → Get paid

Project time expectations

Tasks for this project are estimated to take 6-10 hours to complete, depending on complexity. This is an estimate and not a schedule requirement; you choose when and how to work. Tasks must be submitted by the deadline and meet the listed acceptance criteria to be accepted.

Payment
  • Paid contributions, with rates up to $50/hour*
  • Fixed project rate or individual rates, depending on the project
  • Some projects include incentive payments

*Note: Rates vary based on expertise, skills assessment, location, project needs, and other factors. Higher rates may be offered to highly specialized experts. Lower rates may apply during onboarding or non-core project phases. Payment details are shared per project.

Hol dir deinen kostenlosen, vertraulichen Lebenslauf-Check.
eine PDF-, DOC-, DOCX-, ODT- oder PAGES-Datei bis zu 5 MB per Drag & Drop ablegen.