
Enable job alerts via email!
Generate a tailored resume in minutes
Land an interview and earn more. Learn more
A leading technology firm in Bengaluru is seeking a skilled individual to lead research on Text-to-Speech models. The ideal candidate will have a strong background in speech generation, experience with deep learning frameworks like PyTorch, and knowledge of real-time TTS systems. A Master's or PhD in a relevant field is required. Join us to design and improve innovative TTS systems catering to diverse accents and styles while collaborating with expert teams.
Lead research on Text-to-Speech models focused on naturalness, expressiveness, latency, and robustness
Design and train TTS systems for real-world voices across accents, languages, and speaking styles
Improve streaming and low-latency speech synthesis pipelines
Experiment with architectures, loss functions, and data strategies (multi-speaker training, style modeling, distillation, data augmentation)
Translate research ideas into production-ready TTS systems
Collaborate closely with infra, product, and voice engineering teams
Strong background in Text-to-Speech / speech generation research
Hands-on experience with deep learning frameworks (PyTorch preferred)
Experience with real-time or low-latency TTS systems
Familiarity with modern TTS architectures (Tacotron-style, FastSpeech, VITS, diffusion-based, neural vocoders)
Ability to think end-to-end: data → model → inference → deployment
Prior work in multilingual, expressive, or accented speech synthesis is a strong plus
Publications in top speech / ML conferences
Experience deploying TTS models in real-time production
Exposure to conversational AI or voice agents
3–6 years of specialized experience in speech through academia or industry
Master’s or PhD in Speech, ML, or a related field
Note: We often make exceptions and hire brilliant candidates regardless of years of experience or education. Proof of work is paramount.