Enable job alerts via email!

Machine Learning Ops (MLOps) - AI Foundation Models for Design

Autodesk, Inc.

Boston (MA)

Remote

USD 90,000 - 150,000

Full time

30+ days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An innovative company is seeking a MLOps Developer to collaborate with top AI researchers in building and scaling foundation models for design data. This exciting role involves tackling challenges related to large-scale model training and data processing, contributing to the development of next-gen ML-powered features. With a fully remote-friendly environment, you'll work alongside a diverse global team, fostering connections through regular collaborative offsites. If you're a self-starter with expertise in distributed systems and a passion for AI, this opportunity offers a chance to make a significant impact in the design industry.

Qualifications

  • Experience with distributed systems for ML and deep learning at scale.
  • Strong knowledge of ML infrastructure and model parallelism techniques.

Responsibilities

  • Build scalable ML training pipelines and infrastructure for foundation model development.
  • Optimize distributed training systems and develop solutions for model parallelism.

Skills

Distributed Systems
Machine Learning Infrastructure
Data Engineering
Python Programming
Documentation Skills

Education

BSc in Computer Science
MSc in Computer Science

Tools

PyTorch
AWS
Azure
Apache Spark
Docker
Kubernetes

Job description

Machine Learning Ops (MLOps) - AI Foundation Models for Design

Machine Learning Ops (MLOps) - AI Foundation Models for Design

Job Requisition ID # 25WD87628

Job Title: MLOps Developer for AI Research

Position Overview

The work we do at Autodesk touches nearly every person on the planet. By creating software tools for making buildings, machines, and even the latest movies, we influence and empower some of the most creative people in the world to solve problems that matter.

As a MLOps Developer at Autodesk Research, you will be working side-by-side with world-class AI researchers to build and scale foundation models trained on design data. You will focus on overcoming the challenges associated with large-scale model training and processing of vast amounts of diverse design data. Your expertise in distributed systems, ML infrastructure, and data engineering will be crucial in developing the next generation of ML-powered product features that will help our customers imagine, design, and make a better world.

This role is fully remote-friendly. Our team operates primarily remotely with team members distributed across the globe, with offices in London, Boston, Toronto, and other locations worldwide. At Autodesk, we embrace remote work while fostering connection through regular team offsites for collaborative planning and relationship building. This balanced approach ensures you can work where you're most productive while maintaining meaningful connections with colleagues.

Responsibilities

  • Support AI researchers by building scalable ML training pipelines and infrastructure for foundation model development.
  • Design efficient data processing workflows for large-scale design datasets and industry-specific file formats.
  • Optimize distributed training systems and develop solutions for model parallelism, checkpointing, and efficient resource management.
  • Analyze performance bottlenecks and provide solutions to scaling problems.
  • Implement and maintain robust, testable code that is well documented and easy to understand.
  • Collaborate on projects at the intersection of research and product with a diverse, global team of researchers and engineers.
  • Present results to collaborators and leadership.

Minimum Qualifications

  • BSc or MSc in Computer Science or related field, or equivalent industry experience.
  • Experience with distributed systems for machine learning and deep learning at scale.
  • Strong knowledge of ML infrastructure and model parallelism techniques, including frameworks like PyTorch, Lightning, Megatron, DeepSpeed, and FSDP.
  • Proficiency in Python and strong software engineering practices.
  • Experience with cloud services and architectures (AWS, Azure, etc.).
  • Familiarity with version control, CI/CD, and deployment pipelines.
  • Excellent written documentation skills to document code, architectures, and experiments.

Preferred Qualifications

  • Experience with AEC data formats (e.g., BIM models, IFC files, CAD files, Drawing Sets).
  • Knowledge of the AEC industry and its specific data processing challenges.
  • Experience scaling ML training and data pipelines for large datasets.
  • Experience with distributed data processing and ML infrastructure (e.g., Apache Spark, Ray, Docker, Kubernetes).
  • Experience with performance optimization, monitoring, and efficiency in large-scale ML systems.
  • Experience with Autodesk or similar products (Revit, Sketchup, Forma).

The Ideal Candidate

  • A self-starter who can solve problems with minimal supervision while collaborating effectively with a global, remote-first team.
  • Adaptable and creative, comfortable building new infrastructure or working within existing codebases.
  • Thrives in ambiguous, rapidly evolving areas where learning and flexibility are essential.
  • Excellent communicator who can convey complex technical concepts clearly to diverse audiences.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

AI/ML Engineer for Human Factors

Riverside Research

Lexington

On-site

USD 72,000 - 130,000

24 days ago