Job Search and Career Advice Platform

Activez les alertes d’offres d’emploi par e-mail !

Post-doctoral fellow in model-based reinforcement learning - 24 months contract

Télécom Paris

France

Hybride

EUR 40 000 - 60 000

Plein temps

Aujourd’hui
Soyez parmi les premiers à postuler

Générez un CV personnalisé en quelques minutes

Décrochez un entretien et gagnez plus. En savoir plus

Résumé du poste

A prestigious French engineering school is seeking a postdoctoral researcher in model-based reinforcement learning. The role focuses on developing verifiable world models for reinforcement learning applications in safety-critical domains. Candidates should possess a PhD, strong theoretical knowledge, and programming experience in related environments. The position offers flexible working hours and substantial annual leave, fostering a collaborative and impactful research community.

Prestations

49 days annual leave
Flexible working hours
Telecommuting 1 to 3 days/week possible
75% public transport pass reimbursement

Qualifications

  • Solid theoretical understanding of reinforcement learning.
  • Proven experience in programming reinforcement learning agents.
  • Ability to publish in leading scientific conferences and journals.

Responsabilités

  • Carry out research missions in model-based reinforcement learning.
  • Ensure supervision and tutoring missions.
  • Contribute to the reputation of the school and the institutes.

Connaissances

Programming in reinforcement learning
Mathematics
Fluent in English

Formation

PhD or equivalent

Outils

JAX
PyTorch
Gym
Description du poste

Organisation/Company Télécom Paris Research Field Computer science » Modelling tools Researcher Profile First Stage Researcher (R1) Positions Postdoc Positions Country France Application Deadline 10 Jan 2026 - 00:00 (Africa/Abidjan) Type of Contract Temporary Job Status Full-time Is the job funded through the EU Research Framework Programme? Not funded by a EU programme Is the Job related to staff position within a Research Infrastructure? Yes

Offer Description

Who we are ?

Télécom Paris, part of the IMT (Institut Mines-Télécom) and a founding member of the Institut Polytechnique de Paris, is one of France's top 5 general engineering schools.

The mainspring of Télécom Paris is to train, imagine and undertake to design digital models, technologies and solutions for a society and economy that respect people and their environment.

We are looking for our future postdoctoral researcher in model-based reinforcement learning to join the Computer Science and Networks (INFRES) department at Telecom Paris.

Reinforcement learning (RL) has emerged as a useful paradigm for training agents to perform complex tasks. Model-based RL (MBRL), in particular, promises greater sample efficiency and sophisticated planning capabilities by enabling an agent to learn a predictive model of its environment. However, the direct application of current MBRL methods to safety-critical domains, such as, autonomous robotics, transportation, or industrial control, is hindered by unresolved challenges. The core scientific challenge: The limitations of current world models. Standard approaches to MBRL typically learn a monolithic, “black-box” world model, often using a large neural network as function approximators. While these models can be highly effective for prediction within their training distribution, they suffer from two key limitations for deployment in sociotechnical systems:

  • Brittleness and unpredictable failures: Learned models are prone to unpredictable failures when the agent encounters unseen states or dynamics (i.e., distributional shift). These failures are difficult to anticipate and can lead to unsafe behavior, as the model’s predictions are no longer reliable.
  • Lack of verifiability: The learned models are opaque and do not come with formal guarantees. It is not possible to prove that the model will consistently respect fundamental constraints of the real world or be aligned with expected values, such as physical laws, safety rules, or logical invariants. This lack of verifiable correctness is a major barrier to building trustworthy and well‑calibrated autonomous systems.

Research focus: Verifiable world models. The research will focus on developing a new class of structured, verifiable world models that integrate the flexibility of deep learning with the rigor of formal methods and compositional reasoning. The core research thrusts of this position are:

• Structured, neurosymbolic models: The research will investigate model architectures that are not learned from a blank slate. Instead, they will be designed to incorporate explicit symbolic knowledge. This could include known physical laws, logical rules, or safety constraints, which are treated as fixed, verifiable components of the model. The learning process then focuses on modeling the more complex, unknown aspects of the environment around these established truths.

• Compositional reasoning for safety: We will explore how a complex world model can be constructed by composing smaller, more specialized sub-models. A key research question is how to formally verify properties of the composite model based on the known properties of its individual components. This provides a modular and scalable path to certifying that the agent’s internal model of the world is, and remains, consistent with its safety specifications.

• Model adaptation: A truly intelligent agent must be able to adapt its understanding of the world from experience. This research will develop a framework for safe model adaptation. This involves creating MBRL algorithms where the agent can propose updates to its own world model structure, but these updates are only accepted after a formal verification step confirms that the new model still adheres to its core safety properties.

• Multitask learning: Task decomposition allows agents to learn transversal skills that can be useful in different contexts. Shared representations, multitask and multiobjective RL paradigms improve generalization. The research in this area will explore how to capture task decomposition in world models to enable multitask specifications with verifiable guarantees.

The successful candidate will lead the solution of these open problems through the development and implementation of RL algorithms. They will have the opportunity to make a significant impact in the field of trustworthy and well‑calibrated artificial intelligence (AI) through international collaborations (e.g., UT Austin, MIT).

Your main responsabilities
  • To carry out research missions in the field of model-based RL.
  • To ensure supervision and tutoring missions
  • To contribute to the reputation of the School, the Institut Mines‑Télécom and the Institut Polytechnique de Paris
Job requirements

We are looking for a candidate with a solid theoretical understanding of reinforcement learning, accompanied by a strong foundation in mathematics. You must also have proven experience in programming reinforcement learning agents, particularly with tools such as JAX, PyTorch, Gym, etc.

A proven ability to publish in leading scientific conferences and journals is essential, as is an aptitude for sharing and disseminating your knowledge within the team. Finally, you must be fluent in English in order to thrive in an international environment. You hold a PhD or equivalent. Your level of English is professional.

Why join us?
  • 49 days annual leave (CA + RTT)
  • flexible working hours (depending on department activity)
  • telecommuting 1 to 3 days/week possible
  • 75% public transport pass reimbursement
  • Proximity to numerous sports facilities, concierge service, underground parking, in‑house catering, etc.
  • Good to know: our social security contributions are lower than in the private sector
Obtenez votre examen gratuit et confidentiel de votre CV.
ou faites glisser et déposez un fichier PDF, DOC, DOCX, ODT ou PAGES jusqu’à 5 Mo.