
Activez les alertes d’offres d’emploi par e-mail !
Générez un CV personnalisé en quelques minutes
Décrochez un entretien et gagnez plus. En savoir plus
A leading educational institution in Palaiseau is seeking an Assistant/Associate Professor in multimodal generative AI models. This role involves designing teaching programs, conducting cutting-edge research, and fostering collaborations in the digital technology domain. Candidates should hold a PhD and possess expertise in generative AI and machine learning. This position offers a permanent contract with competitive benefits, including hybrid work options and ample paid leave.
Job description
Télécom Paris , an international multidisciplinary center for education, research, and innovation, is a leader in the digital world.
We are looking for an Assistant / Associate Professor in multimodal generative AI models for audio to jointhe research group which is part of the Signal, Statistic and Learning (S2A) Team.
The number of methodological challenges raised by the application of Generative-AI approaches to audio data (speech, music, environmental sounds) is considerable. While advances in recent years have largely relied on pattern recognition models and optimization techniques to scale up, the emergence of generative models—whether based on diffusion models (score / flow matching) or autoregressive approaches—is now opening up new perspectives, while raising fundamental scientific questions.
The extreme complexity and diversity of audio data (multilingual speech, rich and varied musical signals, complex acoustic environments, biased or noisy data), combined with the growing demands of these applications (interpretability, reliability, robustness, fairness, near–real-time generation, control over style or generated content, etc.), make it necessary to rethink existing methodological and theoretical frameworks. These challenges take on an additional dimension with the development of multimodal generation, where audio is produced from heterogeneous modalities (e.g., text to audio, image to audio, or even video to audio), sensory modalities (brain to audio), or biological sensors (sweating, ECG, etc.). These scenarios raise new scientific challenges, both in terms of modeling (intermodal alignment, joint representation, generation control) and in terms of usage (perceptual quality, semiotic consistency, acceptability).
To succeed in this role, you have a PhD and you are fluent in English.
The position is open to all candidates working in marchine learning, expertise in the following areas will be preferred :