Enable job alerts via email!

Principle Machine Learning Ops Developer, AI/ML Platform

Autodesk

Canada

On-site

CAD 141,000 - 195,000

Full time

3 days ago
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading company in the tech industry is looking for a Principal Machine Learning Operations Developer. This role involves designing and managing software systems that enhance ML capabilities, mentoring team members, and fostering multi-cloud architectures. Successful candidates will have extensive experience in MLOps, Data Engineering, and DevOps, with a strong focus on collaborative and innovative solutions.

Qualifications

  • Over 8 years in software development and engineering.
  • Prior experience with MLOps teams, ML model deployment, DevOps, and data engineering.
  • Knowledge of AWS and/or Azure for large-scale application deployment.

Responsibilities

  • Design, implement, and manage software systems for the AI/ML Platform.
  • Share knowledge, best practices, and conduct design reviews.
  • Develop systems for monitoring model performance, data drift, fairness/bias.

Skills

MLOps
Data Engineering
DevOps
Golang
Python
Java
Creative problem-solving
Excellent communication

Education

Bachelor’s degree in Computer Science or equivalent

Tools

CI/CD
Terraform
Docker
Kubernetes
GitOps
Spark
Airflow
MLflow
Kubeflow
TensorBoard

Job description

Job Requisition ID # 25WD89028

Principal Machine Learning Operations Developer

Job Description

Position Overview

We are seeking an experienced Principal Software Engineer to join our AI/ML Platform (AMP) team. This team develops and maintains core components to accelerate ML/AI model development, including the model development studio, feature store, model serving, and observability tools. The ideal candidate will have a background in MLOps, Data Engineering, and DevOps, with experience in building scalable deployment architectures and observability systems. As a key member of our engineering team, you will help shape the future of our AI/ML capabilities, delivering innovative solutions that add value to our organization. You will report to a manager.

Responsibilities

  1. System Design: Design, implement, and manage software systems for the AI/ML Platform, overseeing the full ML development lifecycle for partner teams.

  2. Mentoring: Share knowledge, best practices, and conduct design reviews to elevate team expertise.

  3. Multi-cloud Architecture: Define components leveraging multiple cloud platforms (e.g., AWS, Azure) to optimize performance, cost, and scalability.

  4. AI/ML Observability: Develop systems for monitoring model performance, data drift, fairness/bias, and anomalies.

  5. ML Solution Deployment: Create tools for building and deploying ML artifacts in production environments, ensuring smooth transition from development to deployment.

  6. Big Data Management: Automate and orchestrate large-scale data transformation and processing tasks, building data stores for ML artifacts.

  7. Scalable Services: Design low-latency, scalable prediction and inference services.

  8. Cross-Functional Collaboration: Work with machine learning researchers, developers, product managers, and operations teams to foster collaboration.

  9. End-to-End Ownership: Take ownership of components, including design, architecture, implementation, rollout, onboarding, support, testing, and investigations.

Minimum Qualifications

  1. Educational Background: Bachelor’s degree in Computer Science or equivalent practical experience.

  2. Experience: Over 8 years in software development and engineering, delivering production systems and services.

  3. Prior experience with MLOps teams, ML model deployment, DevOps, and data engineering.

  4. Hands-on skills in coding with Golang, Python, or Java.

  5. Knowledge of DevOps practices, containerization, orchestration tools such as CI/CD, Terraform, Docker, Kubernetes, GitOps.

  6. Experience with distributed data processing frameworks like Spark, Airflow, and data lake architectures using formats like Iceberg or Parquet.

  7. Experience collaborating with Data Science teams to deploy models and implement ML observability for inference monitoring.

  8. Exposure to building RAG-based applications in collaboration with product teams and AI engineers.

  9. Creative problem-solving skills, with the ability to break down complex problems.

  10. Knowledge of AWS and/or Azure for large-scale application deployment.

  11. Excellent communication and teamwork skills.

Preferred Qualifications

  1. Experience integrating with third-party vendors.

  2. Latency optimization skills for serving systems.

  3. Familiarity with tools like MLflow, Kubeflow, TensorBoard for model monitoring.

  4. Experience with distributed model training/inference pipelines using KubeRay or similar tools.

  5. Experience leveraging GPU computing (CUDA, OpenCL) for AI/ML workloads.

  6. Familiarity with ML libraries such as PyTorch, TensorFlow, XGBoost, Pandas, and Scikit-Learn.

Learn More

About Autodesk

At Autodesk, we create software that transforms how things are made, from buildings to movies. We foster a culture of innovation, diversity, and belonging, where everyone can thrive and contribute to building a better future.

Salary Transparency

Starting base salary in Canada-BC ranges from $141,600 to $194,700, based on experience and location. Compensation includes bonuses, stock grants, and benefits.

Diversity & Belonging

We are committed to an inclusive culture. Learn more: https://www.autodesk.com/company/diversity-and-belonging

Existing Contractors or Consultants

If you are an existing contractor or consultant, please apply through internal channels.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Principle Machine Learning Ops Developer, AI/ML Platform

Autodesk, Inc.

null null

Remote

Remote

CAD 141.000 - 195.000

Full time

7 days ago
Be an early applicant