Enable job alerts via email!

Senior Machine Learning Engineer - Speech / Voice AI remote

Robert Walters

Lancashire

Remote

GBP 60,000 - 85,000

Full time

Today

Be an early applicant

Job summary

A technology-enabled wellbeing platform is seeking a Senior Machine Learning Engineer to develop a voice generation and audio delivery system. This fully remote role focuses on creating empathetic text-to-speech capabilities and enhancing accessibility for neurodiverse users. The ideal candidate has extensive experience in machine learning, particularly in speech processing, and is familiar with TTS models and PyTorch.

Qualifications

Strong background in Machine Learning / Deep Learning with experience in speech or audio processing.
Experience fine-tuning or deploying modern TTS models (e.g., VITS, Bark, FastSpeech2).
Proficiency in PyTorch and optimizing GPU inference.

Responsibilities

Develop an in-house voice generation and audio delivery system.
Build a text-to-speech capability with natural, empathetic voices.
Implement multilingual functionality and customizable voice tones.

Skills

Machine Learning / Deep Learning

Speech or audio processing

Fine-tuning modern TTS models

Proficiency in PyTorch

Deploying ML models to production

Familiarity with AWS, GCP, or Azure

Our client is a technology-enabled wellbeing platform that supports neurodiverse users and individuals with disabilities to thrive in education, work, and everyday life. They are looking to develop an in-house voice generation and audio delivery system to enhance accessibility and emotional engagement and searching for the ML Engineer to work remotely that's going to make it happen!

Senior Machine Learning Engineer - Speech / Voice AI (remote)

Contract length: 3-month
IR-35 determination: Outside
Location: Fully remote

Our client is a technology-enabled wellbeing platform that supports neurodiverse users and individuals with disabilities to thrive in education, work, and everyday life. They offer businesses a personal productivity app featuring tools for task breakdown, priority-setting, and structured support to manage anxiety, procrastination, and executive dysfunction. The platform combines tailored learning resources, assistive technology guidance, and mental health content in one accessible space. It serves both students and professionals, helping them build resilience, independence, and sustainable wellbeing through behaviour-change frameworks.

Responsibilities

Develop an in-house voice generation and audio delivery system to enhance accessibility and emotional engagement.
Build a text-to-speech capability that produces natural, empathetic voices for guided exercises and wellbeing content.
Implement multilingual functionality and customizable voice tones to support diverse user needs.
Enable dynamic personalization so users receive content in voices and styles suited to their preferences.
Integrate the audio system seamlessly with the existing app and backend for real-time playback and consistency across devices.
Create an inclusive, emotionally intelligent audio experience that deepens user connection and supports lasting behavioural wellbeing.

Required skills

Strong background in Machine Learning / Deep Learning with hands-on experience in speech or audio processing.
Experience fine-tuning or deploying modern TTS models (e.g., VITS, Bark, or FastSpeech2).
Proficiency in PyTorch (or similar) and comfortable optimizing GPU inference.
Experience deploying ML models to production and integrating via APIs.
Familiarity with AWS, GCP, or Azure for scalable deployment.

Desirable skills

Understanding of speaker cloning or emotional prosody control.
Experience with multilingual TTS or phoneme alignment.
Interest in ethical AI and accessible, emotionally sensitive applications.

Robert Walters Operations Limited is an employment business and employment agency and welcomes applications from all candidates

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.