Enable job alerts via email!

Senior Machine Learning Engineer - Speech / Voice AI (remote)

Robert Walters UK

Manchester

Remote

GBP 100,000 - 125,000

Part time

3 days ago

Be an early applicant

Job summary

A leading recruitment agency is seeking a Senior Machine Learning Engineer to develop an in-house voice generation system. This remote role focuses on creating empathetic audio experiences to enhance accessibility for neurodiverse users. Candidates should have strong skills in ML and audio processing, with experience in deploying TTS models and optimizing performance. Join a mission-driven organization making a difference worldwide.

Qualifications

Strong background in Machine Learning / Deep Learning with hands-on experience in speech or audio processing.
Experience fine-tuning or deploying modern TTS models.
Proficiency in PyTorch (or similar) and comfortable optimizing GPU inference.

Responsibilities

Develop an in-house voice generation and audio delivery system.
Build a text-to-speech capability that produces natural, empathetic voices.
Implement multilingual functionality and customizable voice tones.

Skills

Machine Learning

Deep Learning

Speech Processing

Audio Processing

PyTorch

GPU Inference

ML Model Deployment

API Integration

Cloud Services (AWS, GCP, Azure)

Senior Machine Learning Engineer - Speech / Voice AI (remote)

Our client is a technology-enabled wellbeing platform that supports neurodiverse users and individuals with disabilities to thrive in education, work, and everyday life. They are looking to develop an in‑house voice generation and audio delivery system to enhance accessibility and emotional engagement, and are searching for an ML Engineer to work remotely to make it happen.

Key details: Contract length: 3‑month; IR‑35 determination: Outside; Location: Fully remote.

Responsibilities:

Develop an in‑house voice generation and audio delivery system to enhance accessibility and emotional engagement.
Build a text‑to‑speech capability that produces natural, empathetic voices for guided exercises and wellbeing content.
Implement multilingual functionality and customizable voice tones to support diverse user needs.
Enable dynamic personalization so users receive content in voices and styles suited to their preferences.
Integrate the audio system seamlessly with the existing app and backend for real‑time playback and consistency across devices.
Create an inclusive, emotionally intelligent audio experience that deepens user connection and supports lasting behavioural wellbeing.

Required skills:

Strong background in Machine Learning / Deep Learning with hands‑on experience in speech or audio processing.
Experience fine‑tuning or deploying modern TTS models (e.g. VITS, Bark, or FastSpeech2).
Proficiency in PyTorch (or similar) and comfortable optimizing GPU inference.
Experience deploying ML models to production and integrating via APIs.
Familiarity with AWS, GCP, or Azure for scalable deployment.

Desirable:

Understanding of speaker cloning or emotional prosody control.
Experience with multilingual TTS or phoneme alignment.
Interest in ethical AI and accessible, emotionally sensitive applications.

This is an exciting opportunity to help shape an inclusive AI experience that brings empathy and accessibility to users around the world.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.