Enable job alerts via email!

Platform Engineer II, Machine Learning Infrastructure

Spotify AB

Toronto

Hybrid

CAD 90,000 - 120,000

Full time

2 days ago
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

Join the Hendrix ML Platform team at a leading audio streaming service in Toronto, where you'll manage large-scale production Kubernetes clusters and contribute to the ML platform SDK. This role offers flexible work arrangements and extensive learning opportunities, helping to shape the future of machine learning infrastructure.

Benefits

Extensive learning opportunities
Flexible share incentives
Global parental leave
Employee assistance program
Flexible public holidays

Qualifications

  • 3+ years of hands-on experience implementing production ML infrastructure.
  • Experience with public cloud providers like GCP, AWS, or Azure.

Responsibilities

  • Manage and maintain large-scale production Kubernetes clusters for ML workloads.
  • Collaborate with MLEs, researchers, and product teams for scalable ML solutions.

Skills

Python
Go
Kubernetes
Machine Learning
Agile Software Processes

Tools

GCP
AWS
Azure
Huggingface
Ray
PyTorch
TensorFlow

Job description

The Hendrix ML Platform team is dedicated to developing a robust, Spotify-wide platform for training and serving machine learning models. This platform streamlines the productionization of AI and ML models by mitigating the incidental complexities involved in creating backend services for serving predictions and training models.

Location
  • Toronto
Job type

Permanent

What You'll Do
  • Manage and maintain large-scale production Kubernetes clusters for ML workloads, including ML platform infrastructure and necessary DevOps.
  • Contribute to Spotify ML Platform SDK and build tools for various ML operations.
  • Collaborate with Machine Learning Engineers (MLE), researchers, and product teams to deliver scalable ML platform solutions that meet timelines and requirements.
  • Work independently and collaboratively on squad projects, often learning and applying new technologies beyond existing skillsets.
  • Design, document, and implement reliable, testable, and maintainable solutions for ML infrastructure capabilities.
Who You Are
  • 3+ years of hands-on experience implementing production ML infrastructure at scale in Python, Go, or similar languages.
  • 3+ years of experience with a public cloud provider such as GCP, AWS, or Azure (preferably GCP).
  • Knowledge of deep learning fundamentals, algorithms, and open-source tools like Huggingface, Ray, PyTorch, or TensorFlow.
  • Good understanding of distributed training leveraging GPUs and Kubernetes is a plus.
  • General understanding of data processing for ML.
  • Experience with agile software processes and modular code design following industry standards.
Where You'll Be
  • This role is based in Toronto.
  • Flexible work arrangements with some in-person meetings, allowing for remote work.
Additional Benefits
  • Extensive learning opportunities through our dedicated team, GreenHouse.
  • Flexible share incentives.
  • Global parental leave, six months fully paid for new parents.
  • Employee assistance program and self-care hub.
  • Flexible public holidays and swap days off according to your values and beliefs.

Learn about life at Spotify and join a diverse, inclusive workplace where your voice matters. Our mission is to unlock human creativity and connect artists with fans worldwide. Since 2008, we’ve been transforming music listening and are now the world’s most popular audio streaming service with over 500 million users.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Senior Full Stack Engineer (Remote)

Jerry

Toronto

Remote

CAD 90,000 - 130,000

2 days ago
Be an early applicant

Senior Full-Stack Engineer

Alvéole

Toronto

Remote

CAD 110,000 - 140,000

10 days ago

Platform Engineer II - Machine Learning Infrastructure

Spotify

Toronto

Hybrid

CAD 100,000 - 130,000

3 days ago
Be an early applicant

Machine Learning Engineer I

Affirm

London

Remote

CAD 102,000 - 142,000

Yesterday
Be an early applicant

Machine Learning Engineer I

Affirm

Halifax

Remote

CAD 102,000 - 142,000

2 days ago
Be an early applicant

Lead Big Data Developer

Genesys

British Columbia

Remote

CAD 100,000 - 130,000

2 days ago
Be an early applicant

Artificial Intelligence DEVOPS SME

Atlas Technologies, Inc.

Remote

CAD 100,000 - 130,000

4 days ago
Be an early applicant

Software Engineer, in Test - Remote

Optum

Vancouver

Remote

CAD 63,000 - 132,000

2 days ago
Be an early applicant

Senior & Lead AI Software Engineer

Greybridge Search & Selection

Toronto

Hybrid

CAD 100,000 - 160,000

2 days ago
Be an early applicant