Enable job alerts via email!

Senior Data Engineer

Spotify

London

Remote

GBP 60,000 - 80,000

Full time

30+ days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

Join a forward-thinking team at a leading audio streaming platform, where you'll be at the forefront of building innovative text-to-speech technologies. In this dynamic role, you will develop large-scale data pipelines and contribute to machine learning projects that power exciting new AI experiences. With a focus on collaboration and mentorship, you'll work alongside talented engineers and researchers, delivering high-quality code and enhancing the team's best practices. This is a unique opportunity to make a significant impact in the field of generative voice products while enjoying the flexibility to work from anywhere in the UK.

Qualifications

  • Experience with high-volume data and distributed systems like Hadoop and GCP.
  • Strong Python skills with experience in data processing frameworks.

Responsibilities

  • Build large-scale speech and audio data pipelines using GCP and Apache Beam.
  • Collaborate with engineers and researchers to develop generative AI experiences.

Skills

Data Engineering
Python Programming
Machine Learning
Agile Software Processes
Collaboration

Tools

Google Cloud Platform
Apache Beam
Docker
AWS
Luigi
Airflow

Job description

The Speak team is Spotify's in-house text-to-speech (TTS) team, supporting products like DJ, AI Voice Translation, as well as the development of exciting new unreleased products. We focus on building world-class speech technologies that can power the next generation of personalized generative voice products at scale.


What You'll Do
  • Build large-scale speech and audio data pipelines using frameworks like Google Cloud Platform and Apache Beam.
  • Work on machine learning projects powering new generative AI experiences and helping to build state-of-the-art text-to-speech models.
  • Learn and contribute to the team's best practices and techniques for building data pipelines for large-scale generative models, including cleaning, filtering, classifying, and labeling.
  • Collaborate with other engineers, researchers, product managers, and stakeholders, taking on learning and leadership opportunities that arise.
  • Deliver scalable, testable, maintainable, and high-quality code.
  • Share knowledge, promote standard methodologies, and make your team the best version of itself through mentorship and constructive accountability.
Who You Are
  • You have Data Engineering experience and know how to work with high-volume, heterogeneous data, preferably with distributed systems such as Hadoop, BigTable, Cassandra, GCP, AWS.
  • You have experience building clean, high-quality datasets for training large-scale machine learning models, with a focus on audio data preferred.
  • You have experience with one or more higher-level Python or Java-based data processing frameworks such as Beam, Dataflow, Crunch, Scalding, Storm, Spark, etc.
  • You have strong Python programming abilities. You might have worked with Docker as well as Luigi, Airflow, or similar tools.
  • You care about quality and know what it means to ship high-quality code.
  • You have experience managing data retention policies.
  • You care about agile software processes, data-driven development, reliability, and responsible experimentation.
  • You understand the value of collaboration and partnership within teams.
  • You have experience in developing datasets tailored for training high-performance machine learning models.
  • Familiarity with generative models or audio-based machine learning applications is highly desirable.
  • You are proficient in cleaning, filtering, and evaluating dataset quality, leveraging both pre-trained and in-house machine learning models, as well as human evaluation techniques, to ensure optimal quality.
Where You'll Be
  • We offer you the flexibility to work where you work best! For this role, you can be within the UK region as long as we have a work location.
  • This team operates within the GMT time zone for collaboration.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.