
Activez les alertes d’offres d’emploi par e-mail !
Générez un CV personnalisé en quelques minutes
Décrochez un entretien et gagnez plus. En savoir plus
A prestigious French engineering school is seeking a postdoctoral researcher in model-based reinforcement learning. The role focuses on developing verifiable world models for reinforcement learning applications in safety-critical domains. Candidates should possess a PhD, strong theoretical knowledge, and programming experience in related environments. The position offers flexible working hours and substantial annual leave, fostering a collaborative and impactful research community.
Organisation/Company Télécom Paris Research Field Computer science » Modelling tools Researcher Profile First Stage Researcher (R1) Positions Postdoc Positions Country France Application Deadline 10 Jan 2026 - 00:00 (Africa/Abidjan) Type of Contract Temporary Job Status Full-time Is the job funded through the EU Research Framework Programme? Not funded by a EU programme Is the Job related to staff position within a Research Infrastructure? Yes
Who we are ?
Télécom Paris, part of the IMT (Institut Mines-Télécom) and a founding member of the Institut Polytechnique de Paris, is one of France's top 5 general engineering schools.
The mainspring of Télécom Paris is to train, imagine and undertake to design digital models, technologies and solutions for a society and economy that respect people and their environment.
We are looking for our future postdoctoral researcher in model-based reinforcement learning to join the Computer Science and Networks (INFRES) department at Telecom Paris.
Reinforcement learning (RL) has emerged as a useful paradigm for training agents to perform complex tasks. Model-based RL (MBRL), in particular, promises greater sample efficiency and sophisticated planning capabilities by enabling an agent to learn a predictive model of its environment. However, the direct application of current MBRL methods to safety-critical domains, such as, autonomous robotics, transportation, or industrial control, is hindered by unresolved challenges. The core scientific challenge: The limitations of current world models. Standard approaches to MBRL typically learn a monolithic, “black-box” world model, often using a large neural network as function approximators. While these models can be highly effective for prediction within their training distribution, they suffer from two key limitations for deployment in sociotechnical systems:
Research focus: Verifiable world models. The research will focus on developing a new class of structured, verifiable world models that integrate the flexibility of deep learning with the rigor of formal methods and compositional reasoning. The core research thrusts of this position are:
• Structured, neurosymbolic models: The research will investigate model architectures that are not learned from a blank slate. Instead, they will be designed to incorporate explicit symbolic knowledge. This could include known physical laws, logical rules, or safety constraints, which are treated as fixed, verifiable components of the model. The learning process then focuses on modeling the more complex, unknown aspects of the environment around these established truths.
• Compositional reasoning for safety: We will explore how a complex world model can be constructed by composing smaller, more specialized sub-models. A key research question is how to formally verify properties of the composite model based on the known properties of its individual components. This provides a modular and scalable path to certifying that the agent’s internal model of the world is, and remains, consistent with its safety specifications.
• Model adaptation: A truly intelligent agent must be able to adapt its understanding of the world from experience. This research will develop a framework for safe model adaptation. This involves creating MBRL algorithms where the agent can propose updates to its own world model structure, but these updates are only accepted after a formal verification step confirms that the new model still adheres to its core safety properties.
• Multitask learning: Task decomposition allows agents to learn transversal skills that can be useful in different contexts. Shared representations, multitask and multiobjective RL paradigms improve generalization. The research in this area will explore how to capture task decomposition in world models to enable multitask specifications with verifiable guarantees.
The successful candidate will lead the solution of these open problems through the development and implementation of RL algorithms. They will have the opportunity to make a significant impact in the field of trustworthy and well‑calibrated artificial intelligence (AI) through international collaborations (e.g., UT Austin, MIT).
We are looking for a candidate with a solid theoretical understanding of reinforcement learning, accompanied by a strong foundation in mathematics. You must also have proven experience in programming reinforcement learning agents, particularly with tools such as JAX, PyTorch, Gym, etc.
A proven ability to publish in leading scientific conferences and journals is essential, as is an aptitude for sharing and disseminating your knowledge within the team. Finally, you must be fluent in English in order to thrive in an international environment. You hold a PhD or equivalent. Your level of English is professional.