Job Search and Career Advice Platform

Enable job alerts via email!

Speech/Audio Generation Researcher

ANUTTACON PTE. LTD.

Singapore

On-site

SGD 70,000 - 120,000

Full time

Yesterday
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A tech-focused company in Singapore is seeking an experienced professional to conduct cutting-edge research in speech and audio foundation models. The role involves designing, developing, and evaluating generative models, including multimodal approaches. Candidates should hold a Bachelor's degree in a related field and possess over 3 years of experience in machine learning and deep learning, specifically in speech and audio generation. Strong communication skills are essential for this innovative position.

Qualifications

  • 3+ years experience in machine learning and deep learning.
  • Experience with speech/audio generation and large language models.
  • Strong communication and presentation skills.

Responsibilities

  • Conduct research in speech/audio foundation models.
  • Develop generative models integrating speech, text, and video.
  • Implement reinforcement learning techniques for model optimization.

Skills

Machine learning
Deep learning
Speech generation
Audio generation
Generative models

Education

Bachelor degree in computer science or related field
Job description

Key Responsibilities:

  • Conduct cutting-edge research and development in speech/audio foundation models.
  • Research, design, develop, and evaluate generative models, including multimodal approaches integrating speech, text, and video.
  • Conduct research to integrate Large Language Models (LLMs) into speech and audio generation (Neural Tokenizers, Large Language Modeling, etc.).
  • Explore and implement Reinforcement Learning (RL) techniques for optimizing generative models in speech and audio applications.
  • Design and curate large-scale, high-quality multimodal datasets in collaboration with cross-functional teams.

Qualifications:

  • Bachelor degree in computer science, mathematics, engineering, a related field, or equivalent professional experience.
  • 3+ years of experience in one or more areas of machine learning and deep learning, including but not limited to:
  • Speech/Vocal generation (including text-to-speech, singing-voice-synthesis, prosody transfer, etc.)
  • Audio generation (including text-to-music, etc.)
  • Large scale speech/audio self-supervised representation learning and foundation models
  • Large Language Model pre-training and fine-tuning
  • Deep knowledge of deep learning and generative models (Diffusion, Flow Matching, AR Transformer, Mamba, VAE, GAN, etc.).
  • Self-driven, innovative, collaborative, with strong communication and presentation skills.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.