Enable job alerts via email!

Principal Software Developer- MLOps Platform

Autodesk

Toronto

On-site

CAD 90,000 - 150,000

Full time

20 days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An established industry player is seeking a Principal Software Developer to join their innovative AI/ML Platform team. This role involves designing and managing software systems that support the full ML development lifecycle while collaborating with diverse teams to enhance AI capabilities. The ideal candidate will have extensive experience in MLOps, data engineering, and DevOps, with a strong background in building scalable architectures. If you are passionate about AI/ML and eager to make a significant impact in a fast-paced environment, this opportunity is perfect for you.

Qualifications

  • 8+ years in software development, focusing on production systems.
  • Hands-on skills in Golang, Python, and Java for high-quality code.

Responsibilities

  • Design and manage software systems for AI/ML Platform.
  • Collaborate with cross-functional teams to enhance ML capabilities.

Skills

Software Development
MLOps
Data Engineering
DevOps
Golang
Python
Java
Problem-Solving
Communication

Education

Bachelor's degree in Computer Science

Tools

Docker
Kubernetes
Terraform
CI/CD
Spark
Airflow
MLflow
Kubeflow
TensorBoard

Job description

Job Requisition ID #

25WD87574

25WD87574, Principal Software Developer- MLOps Platform

French translation to follow!/Traduction française à suivre!

Position Overview

We are looking for an experienced Principal Software Engineer to join our platform team focusing on AI/ML Platform (AMP). This team builds and maintains central components to fast track the development of new ML/AI models such as model development studio, feature store, model serving, and model observability. The ideal candidate would have a background in MLOps, Data engineering, and DevOps with experience in building high-scale deployment architectures and observability. As an important contributor to our engineering team, you will help shape the future of our AI/ML capabilities, delivering solutions that inspire value for our organization. You will report to a manager.

Responsibilities

  • System design: You will design, implement and manage software systems for the AI/ML Platform, orchestrating the full ML development lifecycle for the partner teams.

  • Mentoring: Spread your knowledge, share best practices, and conduct design reviews to enhance expertise at the team level.

  • Multi-cloud architecture: Define components that leverage strengths from multiple cloud platforms (e.g., AWS, Azure) to optimize performance, cost, and scalability.

  • AI/ML observability: Build systems for monitoring the performance of AI/ML models and extracting insights on the underlying data such as drift detection, data fairness/bias, and anomalies.

  • ML Solution Deployment: Develop tools for building and deploying ML artifacts in production environments, facilitating a smooth transition from development to deployment.

  • Big Data Management: Automate and orchestrate tasks related to managing big data transformation and processing, building large-scale data stores for ML artifacts.

  • Scalable Services: Design and implement low-latency, scalable prediction and inference services to support the diverse needs of our users.

  • Cross-Functional Collaboration: Collaborate across diverse teams, including machine learning researchers, developers, product managers, software architects, and operations, fostering a collaborative and cohesive work environment.

  • End-to-end ownership: Take end-to-end ownership of the components and work with other engineers in the team including design, architecture, implementation, rollout, onboarding support to partner teams, production on-call support, testing/verification, and investigations.

Minimum Qualifications

  • Educational Background: Bachelor's degree in Computer Science or equivalent practical experience.

  • Experience: Over 8 years of experience in software development and engineering, delivering production systems and services.

  • Prior experience of working with MLOps teams at the intersection of expertise across ML model deployments, DevOps, and data engineering.

  • Hands-on skills: Ability to fluently translate the design into high-quality code in Golang, Python, and Java.

  • Knowledge of DevOps practices, containerization, and orchestration tools such as CI/CD, Terraform, Docker, Kubernetes, and GitOps.

  • Demonstrated knowledge of distributed data processing frameworks, orchestrators, and data lake architectures using technologies such as Spark, Airflow, and iceberg/parquet formats.

  • Prior collaborations with Data science teams to deploy their models, setting up ML observability for inference-level monitoring.

  • Exposure to building RAG based applications by collaborating with other product teams, Data scientists/AI engineers.

  • Demonstrated creative problem-solving skills with the ability to break down problems into manageable components.

  • Knowledge of Amazon AWS and/or Azure cloud for solutioning large scale application deployments.

  • Excellent communication and collaboration skills, fostering teamwork and effective information exchange.

Preferred Qualifications

  • Experience of integrating with third-party vendors.

  • Experience in latency optimization with the ability to diagnose, tune, and enhance the efficiency of serving systems.

  • Familiarity with tools and frameworks for monitoring and managing the performance of AI/ML models in production (e.g., MLflow, Kubeflow, TensorBoard).

  • Familiarity with distributed model training/inference pipelines using KubeRay or equivalent.

  • Exposure to leveraging GPU computing for AI/ML workloads, including experience with CUDA, OpenCL, or other GPU programming tools, to significantly enhance model training and inference performance.

  • Exposure to ML libraries such as PyTorch, TensorFlow, XGBoost, Pandas, and Scikit-Learn.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.