¡Activa las notificaciones laborales por email!

AI Data Scientist

European Tech Recruit

Vitoria

Presencial

EUR 50.000 - 70.000

Jornada completa

Hoy

Sé de los primeros/as/es en solicitar esta vacante

Descripción de la vacante

A fast-scaling Quantum AI SaaS provider in Spain is hiring an AI Evaluation Data Scientist. This position involves leading evaluation strategies for innovative AI systems, building evaluation pipelines, and curating datasets. Candidates should have an MSc/PhD in a relevant field and experience in applied AI/ML with strong programming skills. The role offers a fixed term contract with potential for extension.

Formación

3+ years (mid-level) or 5+ years (senior) in applied AI / ML with hands-on production experience.
Strong background in evaluating LLMs, RAG, or multi-agent systems.
Excellent communication skills and a passion for building reliable, intelligent AI systems.

Responsabilidades

Lead evaluation strategy for Agentic AI and RAG systems.
Build reproducible evaluation pipelines to track progress over time.
Analyze failures, identify root causes, and drive continuous system improvement.

Conocimientos

Evaluation of LLMs

Python

Machine Learning frameworks

Data curation

Cloud environments

Educación

MSc / PhD in CS, ML, Data Science, Engineering

Herramientas

Docker

Git

A fast-scaling Quantum AI SaaS provider are hiring an AI Evaluation Data Scientist to help shape the future of Generative AI systems. In this role, you’ll design and lead evaluation frameworks for Agentic AI and RAG systems, ensuring real-world reliability, reasoning quality, and user success before deployment. You’ll work across teams to turn insights into measurable product improvements.

This role would be on an initial fixed term contract until end of June 2026, with the option to potentially extend thereafter.

What You’ll Do

Lead evaluation strategy for Agentic AI and RAG systems — defining metrics, success criteria, and real-world test cases.
Build reproducible evaluation pipelines (datasets, configs, automated runs) to track progress over time.
Develop and refine frameworks that go beyond benchmarks to assess reasoning, grounding, and robustness.
Create and curate high-quality datasets (synthetic, adversarial, real-world).
Implement LLM-as-a-judge evaluations aligned with human feedback.
Analyze failures, identify root causes, and drive continuous system improvement.
Partner with ML teams to close the loop between evaluation, data creation, and model refinement.

What You’ll Bring

MSc / PhD in CS, ML, Data Science, Engineering, or related field.
3+ years (mid-level) or 5+ years (senior) in applied AI / ML, with hands‑on production experience.
Strong background in evaluating LLMs, RAG, or multi‑agent systems.
Proficiency in Python, Docker, Git, and ML frameworks (PyTorch, HuggingFace, LangGraph, LlamaIndex, etc.).
Experience with data curation, synthetic data generation, and cloud environments (AWS preferred).
Excellent communication skills and a passion for building reliable, intelligent AI systems.

Consigue la evaluación confidencial y gratuita de tu currículum.

o arrastra un archivo en formato PDF, DOC, DOCX, ODT o PAGES de hasta 5 MB.