Enable job alerts via email!

Platform Engineer II - Machine Learning Infrastructure

Spotify

Toronto

Hybrid

CAD 100,000 - 130,000

Full time

3 days ago
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading company in the music streaming industry is seeking a Machine Learning Infrastructure Engineer to join their Hendrix ML Platform team in Toronto. This role involves managing Kubernetes clusters, collaborating with teams, and developing scalable ML solutions. The ideal candidate should have extensive experience with production ML infrastructure and cloud services, along with a solid understanding of deep learning and agile processes. Flexibility in work location is offered, with some in-person meetings.

Qualifications

  • 3+ years of hands-on experience implementing production ML infrastructure at scale.
  • Experience with public cloud providers like GCP, AWS, or Azure.

Responsibilities

  • Manage and maintain large scale production Kubernetes clusters for ML workloads.
  • Collaborate with MLEs and product teams to deliver scalable ML platform tooling solutions.

Skills

Python
Go
Agile Software Processes
Modular Code Design

Tools

GCP
AWS
Azure
Huggingface
Ray
PyTorch
TensorFlow
Kubernetes

Job description

The Hendrix ML Platform team is dedicated to developing a robust, Spotify-wide platform for training and serving machine learning models. This platform streamlines the productionization of AI and ML models by mitigating the incidental complexities involved in creating backend services for serving predictions and training models.


What You'll Do
  • Manage and maintain large scale production Kubernetes clusters for ML workloads, including ML platform infrastructure and necessary dev ops.
  • Contribute to Spotify ML Platform SDK and build tools for various ML operations.
  • Collaborate with Machine Learning Engineers (MLE), researchers, and various product teams to deliver scalable ML platform tooling solutions that meet the timelines and specifications of given requirements.
  • Work independently and collaboratively on squad projects that often requires learning and applying new technologies that may go beyond existing skillsets.
  • Designs, documents and implements reliable, testable and maintainable solutions ML infrastructure capabilities.
Who You Are
  • You have 3+ years of hands-on experience implementing production ML infrastructure at scale in Python, Go or similar languages
  • 3+ years of experience working with a public cloud provider such as GCP, AWS, or Azure. Preferably GCP.
  • Knowledge of deep learning fundamentals, algorithms, and open-source tools such as Huggingface, Ray, PyTorch or TensorFlow
  • Good to have an understanding of distributed training leveraging GPUs and Kubernetes
  • You have a general understanding of data processing for ML
  • You have experience with agile software processes and modular code design following industry standards
Where You'll Be
  • This role is based in Toronto.
  • We offer you the flexibility to work where you work best! There will be some in person meetings, but still allows for flexibility to work from home.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Senior Full Stack Engineer (Remote)

Jerry

Toronto

Remote

CAD 90,000 - 130,000

2 days ago
Be an early applicant

Senior Full-Stack Engineer

Alvéole

Toronto

Remote

CAD 110,000 - 140,000

10 days ago

Platform Engineer II, Machine Learning Infrastructure

Spotify AB

Toronto

Hybrid

CAD 90,000 - 120,000

Yesterday
Be an early applicant

Software Engineer II (Merchant Data Platform)

Affirm

Kelowna

Remote

CAD 125,000 - 175,000

4 days ago
Be an early applicant

Software Engineer II (Merchant Risk Intelligence & Platform)

Affirm

Oshawa

Remote

CAD 125,000 - 175,000

20 days ago

Software Engineer II (Merchant Risk Intelligence & Platform)

Affirm

Oshawa

Remote

CAD 125,000 - 175,000

20 days ago

Machine Learning Engineer I

Affirm

London

Remote

CAD 102,000 - 142,000

Yesterday
Be an early applicant

Machine Learning Engineer I

Affirm

Halifax

Remote

CAD 102,000 - 142,000

2 days ago
Be an early applicant

Lead Big Data Developer

Genesys

British Columbia

Remote

CAD 100,000 - 130,000

2 days ago
Be an early applicant