Job Search and Career Advice Platform

Enable job alerts via email!

TTS Direction Algorithm Engineer

X STAR TECHNOLOGY PTE. LTD.

Singapore

On-site

SGD 80,000 - 120,000

Full time

11 days ago

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading technology company in Singapore seeks an experienced professional to design and implement TTS algorithms. The ideal candidate holds an MS or PhD in relevant fields and has proven experience in production-grade TTS systems. This role requires strong ML engineering skills, excellent collaboration abilities, and a deep understanding of the speech synthesis pipeline. Join our team to contribute to innovative voice technologies.

Qualifications

  • Proven experience in designing and shipping production-grade TTS or speech-generation systems.
  • Strong ML engineering skills with Python + PyTorch.
  • Excellent collaboration skills to work with cross-functional teams.

Responsibilities

  • Design and implement algorithms that steer TTS output.
  • Collaborate with researchers to apply breakthroughs in speech synthesis.
  • Optimize system latency and ensure responsible voice output.

Skills

ML engineering skills
Collaboration & communication
Understanding of speech synthesis pipeline
Experience in TTS or speech-generation systems

Education

MS or PhD in relevant fields

Tools

Python
PyTorch
Job description

Design and implement algorithms that direct or steer TTS output: e.g., controlling prosody, style, voice persona, emotion, pacing, emphasis, accent, dialect, intonation.

Collaborate with researchers and engineers to take breakthroughs in speech synthesis and apply them to production-scale TTS systems.

Work on components such as text normalization, phonetic/linguistic feature extraction, alignment modeling (text ↔ acoustic), prosody modeling, vocoder architectures or waveform generation.

Build evaluation frameworks and metrics for naturalness, intelligibility, expressiveness, latency, voice persona fidelity, bias/fairness across languages and dialects.

Create data pipelines and tooling for voice collection/labeling, human preference judgments, A/B testing for voice direction outputs.

Optimize system latency, throughput, memory/compute requirements, streaming support, real-time constraints for voice in conversation.

Ensure safe, inclusive, responsible voice output: avoid inappropriate style shifts, voice likeness issues, unintended biases or mis-interpretations. Collaborate with Safety, Policy, Product teams.

Integrate the directed-TTS algorithms into product platforms (ChatGPT voice, developer API, accessibility features) and work with product/infra teams to ensure scalability and reliability.

Qualifications
Required

MS or PhD in Computer Science, Electrical Engineering, Speech/Audio Signal Processing, Machine Learning, or equivalent experience.

Proven experience in designing and shipping production-grade TTS or speech‑generation systems: e.g., text‑to‑speech, voice conversion, expressive prosody modeling. (E.g., the Voice AI role at OpenAI requires “building and shipping production voice or speech ML systems (TTS, voice cloning, or generative audio)”.)

Deep understanding of the speech synthesis pipeline: text normalization, linguistic/phonetic features, acoustic modeling, vocoder/waveform generation, prosody modeling. (Again, from the voice role: “Deep understanding of speech synthesis pipelines: text normalization, linguistic/phonetic features, acoustics, vocoding, and prosody modeling.”)

Strong ML engineering skills: Python + PyTorch (or another major framework), experience with data pipelines, model training/evaluation/serving, measurement of MOS/intelligibility/latency.

Experience with audio tooling, data augmentation for speech, and evaluation metrics for naturalness/latency/persona fidelity.

Excellent collaboration & communication skills; ability to work cross‑functionally with research, product, design, infrastructure, safety/policy teams.

Preferred

Experience with multilingual or dialectal voice systems, low‑latency streaming TTS, expressive or adaptive voice personas.

Prior publication or contribution in speech synthesis, voice ML research (e.g., at top conferences, patents).

Experience with large‑scale deployment of voice systems, optimizing latency/throughput in production (e.g., real‑time voice API processing).

Familiarity with voice likeness / speaker identity protection / consent/licensing in voice systems.

Experience with prosody control, emotion modeling, or voice style transfer in TTS.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.