
Activez les alertes d’offres d’emploi par e-mail !
Générez un CV personnalisé en quelques minutes
Décrochez un entretien et gagnez plus. En savoir plus
A prestigious research institution in France seeks a PhD candidate for a research project on stuttering detection using multimodal deep learning techniques. The role involves designing a system that analyzes audio, video, and text inputs to improve detection accuracy. Candidates should hold a Master’s in computer science and have skills in machine learning, deep learning, and Python. Strong analytical and communication abilities are essential for collaboration across disciplines.
Organisation/Company CNRS Department PRAXILING Research Field Computer science Mathematics © Algorithms Researcher Profile First Stage Researcher (R1) Country France Application Deadline 6 Jan 2026 - 23:59 (UTC) Type of Contract Temporary Job Status Full-time Hours Per Week 35 Offer Starting Date 7 Jan 2026 Is the job funded through the EU Research Framework Programme? Not funded by a EU programme Is the Job related to staff position within a Research Infrastructure? No
The PhD candidate will take part in a multidisciplinary research project involving two complementary laboratories: LORIA, a computer science lab with expertise in speech processing and deep learning, and PRAXILING, a language sciences lab known for its work in phonetics and stuttering. The research will rely on an existing annotated audiovisual corpus of French-speaking individuals with fluency disorders. The thesis will be jointly supervised by researchers in computer science and language sciences, ensuring interdisciplinary co-supervision. The doctoral work will be primarily conducted at LORIA in Nancy, with regular stays at PRAXILING in Montpellier to foster scientific collaboration and enrich the research approach through dual expertise.
Stuttering, a fluency disorder affecting millions of individuals, is characterized by stuttering‑like disfluencies (blocks, prolongations, repetitions) linked to dysfunctions in speech motor control. While its automatic detection has already been explored using audio‑based models, current systems remain limited by low robustness, difficulty in identifying certain disfluencies such as silent blocks, and reliance on scarce data. This PhD project proposes a multimodal approach (audio, video, text) to enhance the accuracy and robustness of disfluency detection, leveraging an audiovisual corpus of French‑speaking individuals who stutter. The analysis will rely on modality‑specific encoding techniques, followed by a strategic fusion of their representations for final classification.
The aim of this PhD is to design, develop, and evaluate a multimodal deep learning approach for the automatic detection of stuttering‑like disfluencies in French, by combining audio, video, and textual modalities. The work will be based on an annotated audiovisual corpus of French‑speaking people who stutter, with particular focus on disfluencies that are difficult to detect through audio alone, such as silent blocks, and on robustness to individual variability.
Tasks:
Beyond detection, this PhD aims to contribute methodologically to the field of multimodal fusion applied to pathological speech, with potential impact in clinical contexts.
The candidate should hold a Master’s degree in computer science, have strong skills in machine learning and deep‑learning, and be proficient in Python and frameworks such as PyTorch or TensorFlow. An interest in signal processing (audio/video) and ideally in NLP is expected. Autonomy, rigor, critical thinking, and analytical abilities are essential, along with strong communication skills to work in a multidisciplinary environment. An interest in phonetics, linguistics, and speech disorders—particularly stuttering—would be a plus.