Job Search and Career Advice Platform

Enable job alerts via email!

AI Scenario Writer & Evaluation Engineer (Project-Based)

Mindrift

Remote

USD 10,000 - 60,000

Part time

Today
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A tech company specializing in AI is seeking software engineers for project-based roles to test and improve AI systems. Responsibilities include creating test cases and analyzing results. This is a part-time, non-permanent opportunity, with payments reaching up to $80/hour based on expertise. Flexibility in working hours allows contributors to choose their schedules, with estimated task times of 6-10 hours. Ideal candidates should have strong Python skills and a good understanding of LLM limitations.

Qualifications

  • 3+ years of software development experience, ideally focusing on Python.
  • Experience with Git and managing code repositories.
  • Comfortable with structured formats like JSON and YAML.
  • Understanding of LLM limitations, bias, and context limits.
  • Familiarity with Docker environment.
  • Proficient in English at B2 level.

Responsibilities

  • Create structured test cases that simulate complex human workflows.
  • Define gold-standard behavior and scoring logic to evaluate agents.
  • Analyze logs, failure modes, and decision paths.
  • Validate scenarios using code repositories and test frameworks.
  • Iterate on prompts and test cases for improvement.
  • Ensure production-ready scenarios that are reusable.

Skills

Software development experience
Strong Python focus
Experience with Git
Comfortable with JSON/YAML
Understanding LLM limitations
Familiarity with Docker
English proficiency - B2
Job description
A tech company specializing in AI is seeking software engineers for project-based roles to test and improve AI systems. Responsibilities include creating test cases and analyzing results. This is a part-time, non-permanent opportunity, with payments reaching up to $80/hour based on expertise. Flexibility in working hours allows contributors to choose their schedules, with estimated task times of 6-10 hours. Ideal candidates should have strong Python skills and a good understanding of LLM limitations.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.