Enable job alerts via email!

AI Agent Evaluation Analyst & AI Trainer

Mindrift

Remote

GBP 60,000 - 80,000

Part time

30+ days ago

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A future-focused AI consultancy is looking for QA contributors to validate and improve AI agents' evaluation frameworks. Ideal candidates will possess strong analytical skills and attention to detail while enjoying a flexible, remote work environment. Responsibilities include reviewing evaluation tasks, defining expected behaviors for agents, and collaborating with cross-functional teams to enhance AI systems. This role provides an opportunity to influence future AI technologies and build a unique portfolio.

Qualifications

Ability to reason about complex systems and scenarios.
Skilled in identifying inconsistencies and vague requirements.
Experience with AI, policy evaluation, or logic puzzles.

Responsibilities

Review evaluation tasks for logic and completeness.
Help define expected behaviors for AI agents.
Work closely with teams to suggest refinements.

Skills

Analytical thinking

Attention to detail

Communication skills

Familiarity with structured data formats

Understanding of QA processes

Who we’re looking for

We’re looking for curious and intellectually proactive contributors who double‑check assumptions and play devil’s advocate. Are you comfortable with ambiguity and complexity? Does an async, remote, flexible opportunity sound exciting? Would you like to learn how modern AI systems are tested and evaluated?

About the project

We’re on the hunt for QAs for autonomous AI agents for a new project focused on validating and improving complex task structures, policy logic, and agent evaluation frameworks. Throughout the project, you’ll have to balance quality assurance, research, and logical problem‑solving. This project opportunity is ideal for people who enjoy looking at systems holistically and thinking through scenarios, implications, and edge cases.

What you’ll be doing

Review evaluation tasks and scenarios for logic, completeness, and realism.
Identify inconsistencies, missing assumptions, or unclear decision points.
Help define clear expected behaviours (gold standards) for AI agents.
Annotate cause‑effect relationships, reasoning paths, and plausible alternatives.
Think through complex systems and policies as a human would to ensure agents are tested properly.
Work closely with QA, writers, or developers to suggest refinements or edge‑case coverage.

How to get started

Apply to this post, qualify, and get the chance to contribute to a project aligned with your skills, on your own schedule. Shape the future of AI while building tools that benefit everyone.

Qualifications

Excellent analytical thinking: reason about complex systems, scenarios, and logical implications.
Strong attention to detail: spot contradictions, ambiguities, and vague requirements.
Familiarity with structured data formats: read JSON/YAML.
Can assess scenarios holistically: identify what’s missing, unrealistic, or might break.
Good communication and clear writing (in English) to document findings.
Experience with policy evaluation, logic puzzles, case studies, or structured scenario design.
Background in consulting, academia, olympiads (logic/math/informatics), or research.
Exposure to LLMs, prompt engineering, or AI‑generated content.
Familiarity with QA or test‑case thinking (edge cases, failure modes, “what could go wrong”).
Some understanding of how scoring or evaluation works in agent testing (precision, coverage, etc.).
Open to a part‑time, non‑permanent, flexible, remote freelance project that fits around existing commitments.
Desire to participate in an advanced AI project and enhance your portfolio, influencing how future AI models understand and communicate in your field of expertise.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Top locations

Top companies

Top positions