Job Search and Career Advice Platform

Enable job alerts via email!

Senior MLOps Lead for Scalable Foundation Models (Remote)

Autodesk

Vancouver

On-site

CAD 100,000 - 140,000

Full time

30+ days ago

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading technology company is seeking a Principal Machine Learning Operations Developer for AEC to build scalable ML training pipelines and work alongside top AI researchers. This candidate will tackle the challenges of large-scale model training with a focus on sustainable design data. Experience in distributed systems and proficiency in Python required. The position is fully remote-friendly, allowing flexibility in your work environment.

Qualifications

  • Experience with distributed systems for machine learning at scale.
  • Strong knowledge of model parallelism techniques.
  • Experience with cloud services and architectures.

Responsibilities

  • Support AI researchers by building scalable ML training pipelines.
  • Design efficient data processing workflows for large-scale datasets.
  • Optimize distributed training systems and resource management.

Skills

Distributed systems for machine learning
ML infrastructure and model parallelism
Python proficiency
Excellent written documentation skills
Cloud services (AWS, Azure)

Education

BSc or MSc in Computer Science

Tools

PyTorch
DeepSpeed
Docker
Kubernetes
Apache Spark
Job description
A leading technology company is seeking a Principal Machine Learning Operations Developer for AEC to build scalable ML training pipelines and work alongside top AI researchers. This candidate will tackle the challenges of large-scale model training with a focus on sustainable design data. Experience in distributed systems and proficiency in Python required. The position is fully remote-friendly, allowing flexibility in your work environment.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.