Job Search and Career Advice Platform

Enable job alerts via email!

LLM Agent Evaluation Engineer — Remote

Mindrift

Remote

ZAR 300 000 - 400 000

Part time

Yesterday
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading AI innovation company is seeking a flexible remote role focused on evaluating LLM-based agents. You will design realistic evaluation scenarios and create structured test cases that simulate human workflows. Applicants require a Bachelor's or Master's in Computer Science or related fields, with strong analytical skills and comfort in using tools like Python and JSON. This role offers the chance to influence AI model understanding while contributing to innovative projects.

Benefits

Flexible schedule
Competitive pay up to $24/hour
Chance to contribute to advanced AI projects
Opportunity to enhance your portfolio

Qualifications

  • Background in QA, software testing, data analysis, or NLP annotation.
  • Good understanding of test design principles such as reproducibility and coverage.
  • Basic experience with Python and JS is required.

Responsibilities

  • Create structured test cases that simulate complex human workflows.
  • Define gold‑standard behavior and scoring logic to evaluate agent actions.
  • Analyze agent logs, failure modes, and decision paths.

Skills

Analytical mindset
Attention to detail
Strong written communication skills in English
Curious and open to working with AI-generated content

Education

Bachelor's and/or Master's Degree in Computer Science or related fields

Tools

Python
JavaScript
JSON/YAML
Job description
A leading AI innovation company is seeking a flexible remote role focused on evaluating LLM-based agents. You will design realistic evaluation scenarios and create structured test cases that simulate human workflows. Applicants require a Bachelor's or Master's in Computer Science or related fields, with strong analytical skills and comfort in using tools like Python and JSON. This role offers the chance to influence AI model understanding while contributing to innovative projects.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.