Activez les alertes d’offres d’emploi par e-mail !

Machine Learning Engineer (Post-Training)

Blue Yonder

Paris

Sur place

EUR 60 000 - 90 000

Plein temps

Aujourd’hui

Soyez parmi les premiers à postuler

Générez un CV personnalisé en quelques minutes

Décrochez un entretien et gagnez plus. En savoir plus

Résumé du poste

A leading AI-focused company in Paris is seeking a Machine Learning Engineer specializing in post-training. The ideal candidate will design environments for supply chain decision-making and build robust data pipelines. Essential skills include proficiency in Python and PyTorch, experience with large datasets, and a passion for research and engineering. This role blends creativity with practical implementation and offers the opportunity to shape innovative AI solutions in complex supply chain contexts.

Qualifications

Experience training or fine-tuning LLMs for agents with SFT/DPO.
Ability to design evaluation frameworks for agent performance.
Experience with reward shaping and design in RL techniques.

Responsabilités

Design and implement post-training environments.
Build data pipelines for training and feedback.
Collaborate to translate research into reliable systems.

Connaissances

Python

PyTorch

HF Transformers

Research exploration

Large datasets

Curiosity

Outils

Kubernetes

AWS

GCP

Machine Learning Engineer – Post-Training , AI Studio

About the AI Studio

The AI Studio's mission is to find the fastest possible path to an autonomous supply chain. We're developing AI agents, learning systems, training models, and more to overcome the biggest challenges remaining in the global supply chain.

In short, we are having a lot of fun.

Your mission in this role

We're looking for an ambitious Machine Learning Engineer specializing in Post-Training to work on environments, evaluations, data pipelines, and tooling for robust training systems.

Your work will directly impact how our agents learn to make decisions in complex supply chain environments. You'll help shape how we approach reward modeling, environment design, and agent training.

This role blends research and engineering. You'll implement novel approaches and contribute to our research direction while shipping production-grade systems. If you're energized by pushing the boundaries of what's possible, this is your chance.

Responsibilities

Design and implement post-training environments for supply chain decision-making
Create evaluation frameworks to measure agent performance and catch failure modes
Build data pipelines for training and human feedback collection
Optimize training infrastructure for throughput, efficiency, and fault tolerance
Debug complex issues in training pipelines and model behavior
Collaborate with the team to translate research ideas into reliable systems
Document what works (and what doesn't) so we can compound our learnings
Stay on top of industry trends and cutting edge use cases

We want to talk if you

Have trained or fine-tuned LLMs for agents with SFT/DPO
Are proficient in Python, PyTorch and HF Transformers
Can balance research exploration with shipping working code
Are comfortable working with large datasets and building data pipelines at scale
Thrive in fast-moving environments where priorities shift
Are excited about AI-assisted tools and getting the most out of them
Care about craft in your work
Have a deep sense of curiosity and make a habit of learning
Think globally about how your work impacts the entire organization

Bonus points if

Have hands‑on experience with RL techniques (reward shaping and design, PPO, GRPO and other RLHF approaches)
Have experience with distributed training systems and techniques (DDP, FSDP, N‑D parallelism)
You have experience with human‑in‑the‑loop ML systems
You've built evaluation frameworks for open-ended tasks
You're familiar with supply chain, logistics, or operations domains
You have experience with Kubernetes and cloud infrastructure (AWS, GCP)
You've worked on reward hacking detection or robustness problems
You have a side project that shows you can't stop tinkering

Our Values

If you want to know the heart of a company, take a look at their values. Ours unite us. They are what drive our success – and the success of our customers. Does your heart beat like ours? Find out here: Core Values

All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability or protected veteran status.

Obtenez votre examen gratuit et confidentiel de votre CV.

ou faites glisser et déposez un fichier PDF, DOC, DOCX, ODT ou PAGES jusqu’à 5 Mo.

Noté « Excellent » sur la base de 19 831 évaluations