Job Search and Career Advice Platform

Enable job alerts via email!

Senior Researcher - Text to Speech | Bengaluru

Smallest

Bengaluru

On-site

INR 4,50,000 - 6,75,000

Full time

Today
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading technology firm in Bengaluru is seeking a skilled individual to lead research on Text-to-Speech models. The ideal candidate will have a strong background in speech generation, experience with deep learning frameworks like PyTorch, and knowledge of real-time TTS systems. A Master's or PhD in a relevant field is required. Join us to design and improve innovative TTS systems catering to diverse accents and styles while collaborating with expert teams.

Qualifications

  • 3–6 years of specialized experience in speech through academia or industry.

Responsibilities

  • Lead research on Text-to-Speech models focused on naturalness and expressiveness.
  • Design and train TTS systems for real-world voices across accents.
  • Improve streaming and low-latency speech synthesis pipelines.

Skills

Text-to-Speech / speech generation research
Deep learning frameworks (PyTorch preferred)
Real-time or low-latency TTS systems
Modern TTS architectures
End-to-end thinking from data to deployment
Multilingual, expressive, or accented speech synthesis

Education

Master’s or PhD in Speech, ML, or a related field
Job description
What you’ll do
  • Lead research on Text-to-Speech models focused on naturalness, expressiveness, latency, and robustness

  • Design and train TTS systems for real-world voices across accents, languages, and speaking styles

  • Improve streaming and low-latency speech synthesis pipelines

  • Experiment with architectures, loss functions, and data strategies (multi-speaker training, style modeling, distillation, data augmentation)

  • Translate research ideas into production-ready TTS systems

  • Collaborate closely with infra, product, and voice engineering teams

What we’re looking for
  • Strong background in Text-to-Speech / speech generation research

  • Hands-on experience with deep learning frameworks (PyTorch preferred)

  • Experience with real-time or low-latency TTS systems

  • Familiarity with modern TTS architectures (Tacotron-style, FastSpeech, VITS, diffusion-based, neural vocoders)

  • Ability to think end-to-end: data → model → inference → deployment

  • Prior work in multilingual, expressive, or accented speech synthesis is a strong plus

Great to have
  • Publications in top speech / ML conferences

  • Experience deploying TTS models in real-time production

  • Exposure to conversational AI or voice agents

Years of Experience
  • 3–6 years of specialized experience in speech through academia or industry

Education
  • Master’s or PhD in Speech, ML, or a related field

Note: We often make exceptions and hire brilliant candidates regardless of years of experience or education. Proof of work is paramount.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.