Enable job alerts via email!

Senior Researcher - Text to Speech | Bengaluru

Smallest

Bengaluru

On-site

INR 4,50,000 - 6,75,000

Full time

Today

Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading technology firm in Bengaluru is seeking a skilled individual to lead research on Text-to-Speech models. The ideal candidate will have a strong background in speech generation, experience with deep learning frameworks like PyTorch, and knowledge of real-time TTS systems. A Master's or PhD in a relevant field is required. Join us to design and improve innovative TTS systems catering to diverse accents and styles while collaborating with expert teams.

Qualifications

3–6 years of specialized experience in speech through academia or industry.

Responsibilities

Lead research on Text-to-Speech models focused on naturalness and expressiveness.
Design and train TTS systems for real-world voices across accents.
Improve streaming and low-latency speech synthesis pipelines.

Skills

Text-to-Speech / speech generation research

Deep learning frameworks (PyTorch preferred)

Real-time or low-latency TTS systems

Modern TTS architectures

End-to-end thinking from data to deployment

Multilingual, expressive, or accented speech synthesis

Education

Master’s or PhD in Speech, ML, or a related field

What you’ll do

Lead research on Text-to-Speech models focused on naturalness, expressiveness, latency, and robustness
Design and train TTS systems for real-world voices across accents, languages, and speaking styles
Improve streaming and low-latency speech synthesis pipelines
Experiment with architectures, loss functions, and data strategies (multi-speaker training, style modeling, distillation, data augmentation)
Translate research ideas into production-ready TTS systems
Collaborate closely with infra, product, and voice engineering teams

What we’re looking for

Strong background in Text-to-Speech / speech generation research
Hands-on experience with deep learning frameworks (PyTorch preferred)
Experience with real-time or low-latency TTS systems
Familiarity with modern TTS architectures (Tacotron-style, FastSpeech, VITS, diffusion-based, neural vocoders)
Ability to think end-to-end: data → model → inference → deployment
Prior work in multilingual, expressive, or accented speech synthesis is a strong plus

Great to have

Publications in top speech / ML conferences
Experience deploying TTS models in real-time production
Exposure to conversational AI or voice agents

Years of Experience

3–6 years of specialized experience in speech through academia or industry

Education

Master’s or PhD in Speech, ML, or a related field

Note: We often make exceptions and hire brilliant candidates regardless of years of experience or education. Proof of work is paramount.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.