The Speak team is Spotify's in-house text-to-speech (TTS) team, supporting products like DJ, AI Voice Translation, as well as the development of exciting new unreleased products. We focus on building world-class speech technologies that can power the next generation of personalized generative voice products at scale.
What You'll Do- Build large-scale speech and audio data pipelines using frameworks like Google Cloud Platform and Apache Beam.
- Work on machine learning projects powering new generative AI experiences and helping to build state-of-the-art text-to-speech models.
- Learn and contribute to the team's best practices and techniques for building data pipelines for large-scale generative models, including cleaning, filtering, classifying, and labeling.
- Collaborate with other engineers, researchers, product managers, and stakeholders, taking on learning and leadership opportunities that arise.
- Deliver scalable, testable, maintainable, and high-quality code.
- Share knowledge, promote standard methodologies, and make your team the best version of itself through mentorship and constructive accountability.
Who You Are- You have Data Engineering experience and know how to work with high-volume, heterogeneous data, preferably with distributed systems such as Hadoop, BigTable, Cassandra, GCP, AWS.
- You have experience building clean, high-quality datasets for training large-scale machine learning models, with a focus on audio data preferred.
- You have experience with one or more higher-level Python or Java-based data processing frameworks such as Beam, Dataflow, Crunch, Scalding, Storm, Spark, etc.
- You have strong Python programming abilities. You might have worked with Docker as well as Luigi, Airflow, or similar tools.
- You care about quality and know what it means to ship high-quality code.
- You have experience managing data retention policies.
- You care about agile software processes, data-driven development, reliability, and responsible experimentation.
- You understand the value of collaboration and partnership within teams.
- You have experience in developing datasets tailored for training high-performance machine learning models.
- Familiarity with generative models or audio-based machine learning applications is highly desirable.
- You are proficient in cleaning, filtering, and evaluating dataset quality, leveraging both pre-trained and in-house machine learning models, as well as human evaluation techniques, to ensure optimal quality.
Where You'll Be- We offer you the flexibility to work where you work best! For this role, you can be within the UK region as long as we have a work location.
- This team operates within the GMT time zone for collaboration.