Job Search and Career Advice Platform

Activez les alertes d’offres d’emploi par e-mail !

PhD Position: Multimodal Automatic Detection of Stuttering-Like Disfluencies (M/F)

CNRS

France

Sur place

EUR 40 000 - 60 000

Plein temps

Il y a 2 jours
Soyez parmi les premiers à postuler

Générez un CV personnalisé en quelques minutes

Décrochez un entretien et gagnez plus. En savoir plus

Résumé du poste

A prestigious research institution in France seeks a PhD candidate for a research project on stuttering detection using multimodal deep learning techniques. The role involves designing a system that analyzes audio, video, and text inputs to improve detection accuracy. Candidates should hold a Master’s in computer science and have skills in machine learning, deep learning, and Python. Strong analytical and communication abilities are essential for collaboration across disciplines.

Qualifications

  • Master's degree in computer science required.
  • Strong skills in machine learning and deep learning essential.
  • Proficient in Python and frameworks such as PyTorch or TensorFlow.

Responsabilités

  • Implement and adapt Stutternet for audio encoding.
  • Develop vision models for video encoding.
  • Generate automatic transcriptions and encode using language models.
  • Implement strategies for multimodal fusion of representations.
  • Develop classifier for detecting stuttering.

Connaissances

Machine learning
Deep learning
Python
Signal processing
Critical thinking
Communication skills

Formation

Master’s degree in computer science

Outils

PyTorch
TensorFlow
Description du poste

Organisation/Company CNRS Department PRAXILING Research Field Computer science Mathematics © Algorithms Researcher Profile First Stage Researcher (R1) Country France Application Deadline 6 Jan 2026 - 23:59 (UTC) Type of Contract Temporary Job Status Full-time Hours Per Week 35 Offer Starting Date 7 Jan 2026 Is the job funded through the EU Research Framework Programme? Not funded by a EU programme Is the Job related to staff position within a Research Infrastructure? No

Offer Description

The PhD candidate will take part in a multidisciplinary research project involving two complementary laboratories: LORIA, a computer science lab with expertise in speech processing and deep learning, and PRAXILING, a language sciences lab known for its work in phonetics and stuttering. The research will rely on an existing annotated audiovisual corpus of French-speaking individuals with fluency disorders. The thesis will be jointly supervised by researchers in computer science and language sciences, ensuring interdisciplinary co-supervision. The doctoral work will be primarily conducted at LORIA in Nancy, with regular stays at PRAXILING in Montpellier to foster scientific collaboration and enrich the research approach through dual expertise.

Introduction

Stuttering, a fluency disorder affecting millions of individuals, is characterized by stuttering‑like disfluencies (blocks, prolongations, repetitions) linked to dysfunctions in speech motor control. While its automatic detection has already been explored using audio‑based models, current systems remain limited by low robustness, difficulty in identifying certain disfluencies such as silent blocks, and reliance on scarce data. This PhD project proposes a multimodal approach (audio, video, text) to enhance the accuracy and robustness of disfluency detection, leveraging an audiovisual corpus of French‑speaking individuals who stutter. The analysis will rely on modality‑specific encoding techniques, followed by a strategic fusion of their representations for final classification.

Aims

The aim of this PhD is to design, develop, and evaluate a multimodal deep learning approach for the automatic detection of stuttering‑like disfluencies in French, by combining audio, video, and textual modalities. The work will be based on an annotated audiovisual corpus of French‑speaking people who stutter, with particular focus on disfluencies that are difficult to detect through audio alone, such as silent blocks, and on robustness to individual variability.

Tasks:

  • Audio encoding: Implement and adapt Stutternet to extract acoustic features relevant to disfluency detection by capturing temporal dependencies.
  • Video encoding: Develop and train vision models (e.g., C3D or Transformers) to analyze video sequences for visual cues of stuttering (facial tension, blinking, atypical movements). Exploration of facial landmarks using OpenFace or MediaPipe as a complementary or alternative source of features.
  • Text encoding: Generate automatic transcriptions via Whisper and encode them using pre‑trained language models (BERT, RoBERTa) to extract linguistic context and identify textual patterns characteristic of disfluencies.
  • Multimodal fusion: Implement and compare several strategies to fuse the representations from the three modalities, such as concatenation, adaptive attention mechanisms, or other approaches leveraging data complementarity.
  • Classification and evaluation: Develop a classifier operating on the fused representation to predict the presence or absence of stuttering within a given time window. Evaluation will rely on standard metrics (precision, recall, F1‑score, AUC), with results compared to expert manual annotations. Qualitative analyses will also be conducted to interpret model errors and refine the approach.

Beyond detection, this PhD aims to contribute methodologically to the field of multimodal fusion applied to pathological speech, with potential impact in clinical contexts.

Required Skills

The candidate should hold a Master’s degree in computer science, have strong skills in machine learning and deep‑learning, and be proficient in Python and frameworks such as PyTorch or TensorFlow. An interest in signal processing (audio/video) and ideally in NLP is expected. Autonomy, rigor, critical thinking, and analytical abilities are essential, along with strong communication skills to work in a multidisciplinary environment. An interest in phonetics, linguistics, and speech disorders—particularly stuttering—would be a plus.

Obtenez votre examen gratuit et confidentiel de votre CV.
ou faites glisser et déposez un fichier PDF, DOC, DOCX, ODT ou PAGES jusqu’à 5 Mo.