Enable job alerts via email!

Speech/Audio Generation Researcher

ANUTTACON PTE. LTD.

Singapore

On-site

SGD 70,000 - 120,000

Full time

Yesterday

Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A tech-focused company in Singapore is seeking an experienced professional to conduct cutting-edge research in speech and audio foundation models. The role involves designing, developing, and evaluating generative models, including multimodal approaches. Candidates should hold a Bachelor's degree in a related field and possess over 3 years of experience in machine learning and deep learning, specifically in speech and audio generation. Strong communication skills are essential for this innovative position.

Qualifications

3+ years experience in machine learning and deep learning.
Experience with speech/audio generation and large language models.
Strong communication and presentation skills.

Responsibilities

Conduct research in speech/audio foundation models.
Develop generative models integrating speech, text, and video.
Implement reinforcement learning techniques for model optimization.

Skills

Machine learning

Deep learning

Speech generation

Audio generation

Generative models

Education

Bachelor degree in computer science or related field

Key Responsibilities:

Conduct cutting-edge research and development in speech/audio foundation models.
Research, design, develop, and evaluate generative models, including multimodal approaches integrating speech, text, and video.
Conduct research to integrate Large Language Models (LLMs) into speech and audio generation (Neural Tokenizers, Large Language Modeling, etc.).
Explore and implement Reinforcement Learning (RL) techniques for optimizing generative models in speech and audio applications.
Design and curate large-scale, high-quality multimodal datasets in collaboration with cross-functional teams.

Qualifications:

Bachelor degree in computer science, mathematics, engineering, a related field, or equivalent professional experience.
3+ years of experience in one or more areas of machine learning and deep learning, including but not limited to:
Speech/Vocal generation (including text-to-speech, singing-voice-synthesis, prosody transfer, etc.)
Audio generation (including text-to-music, etc.)
Large scale speech/audio self-supervised representation learning and foundation models
Large Language Model pre-training and fine-tuning
Deep knowledge of deep learning and generative models (Diffusion, Flow Matching, AR Transformer, Mamba, VAE, GAN, etc.).
Self-driven, innovative, collaborative, with strong communication and presentation skills.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Top companies

Popular jobs