Enable job alerts via email!

SDE- ML Engineer, Frontier AI Robotics

Amazon Jobs

San Francisco (CA)

On-site

USD 129,000 - 224,000

Full time

Today
Be an early applicant

Job summary

A leading technology company in San Francisco is seeking a skilled Machine Learning Systems Engineer to develop and optimize distributed training infrastructures for advanced AI applications. The ideal candidate has extensive experience in software development, strong programming skills in PyTorch, Python, and C++, and a deep understanding of machine learning frameworks. The role offers competitive compensation and the opportunity to work on state-of-the-art AI research.

Qualifications

  • 3+ years of non-internship professional software development experience.
  • 2+ years of non-internship design or architecture experience.
  • Deep understanding of deep learning frameworks.

Responsibilities

  • Design, build, and optimize distributed training infrastructure for ML models.
  • Collaborate with scientists and engineers.
  • Evaluate and implement parallelism techniques.

Skills

PyTorch
Python
C++
Collaboration
Understanding of LLM algorithms
Mathematics and statistics

Education

Bachelor's degree in computer science or equivalent
Job description
Overview

We are seeking a highly skilled Machine Learning Systems Engineer to join Frontier AI Robotics team. This role focuses on building and optimizing distributed training infrastructure for large-scale machine learning models, particularly in deep learning and transformer-based architectures. You will work closely with scientists and engineers to deliver scalable, high-performance systems that power state-of-the-art AI research and applications.

About the team

At Frontier AI & Robotics, we're not just advancing robotics – we're reimagining it from the ground up. Our team is building the future of intelligent robotics through frontier foundation models and end-to-end learned systems. We tackle some of the most challenging problems in AI and robotics, from developing sophisticated perception systems to creating adaptive manipulation strategies that work in complex, real-world scenarios.

What sets us apart is our unique combination of ambitious research vision and practical impact. We leverage Amazon's massive computational infrastructure and rich real-world datasets to train and deploy state-of-the-art foundation models. Our work spans the full spectrum of robotics intelligence – from multimodal perception using images, videos, and sensor data, to sophisticated manipulation strategies that can handle diverse real-world scenarios. We're building systems that don't just work in the lab, but scale to meet the demands of Amazon's global operations.

Responsibilities
  • Design, build, and optimize distributed training infrastructure for large-scale ML models, with a focus on deep learning and transformer-based architectures.
  • Collaborate with scientists and engineers to deliver scalable, high-performance systems powering AI research and applications.
  • Apply PyTorch, Python, and C++ skills to engineer modular, scalable ML systems.
  • Evaluate and implement parallelism techniques such as data, tensor, model, and pipeline parallelism.
  • Monitor and optimize GPU memory and throughput for efficient training of large models.
  • Collaborate cross-functionally with research and data infra teams to integrate new models and features.
Qualifications
  • 3+ years of non-internship professional software development experience.
  • 2+ years of non-internship design or architecture experience (design patterns, reliability and scaling) of new and existing systems.
  • Experience programming with at least one software language.
  • Design, build, and optimize machine learning infrastructure for large-scale training and inference.
  • Experience applying PyTorch, Python, and C++ to engineer modular, scalable ML systems.
  • Knowledge of parallelism techniques (data, tensor, model, and pipeline) and the ability to apply them.
  • Ability to monitor and optimize GPU memory and throughput for large models.
  • Strong collaboration skills with cross-functional teams.
  • Deep understanding of LLM algorithms and deep learning frameworks (e.g., PyTorch).
  • Mathematics and statistics background: linear algebra, calculus, probability, and statistics.
  • 3+ years of full software development life cycle experience, including coding standards, code reviews, source control, build processes, testing, and operations.
  • Bachelor's degree in computer science or equivalent.

Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.

Compliance and accommodations

Los Angeles County applicants: job duties include working safely and cooperatively with others; communicating effectively; and following all laws and company policies. Pursuant to the Los Angeles County Fair Chance Ordinance, we will consider qualified applicants with arrest and conviction records. Pursuant to the San Francisco Fair Chance Ordinance, we will consider qualified applicants with arrest and conviction records. Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit https://amazon.jobs/content/en/how-we-hire/accommodations for more information. If the country/region you’re applying in isn’t listed, please contact your Recruiting Partner.

Our compensation reflects the cost of labor across several US geographic markets. The base pay for this position ranges from $129,300/year in our lowest geographic market up to $223,600/year in our highest geographic market. Pay is based on a number of factors including market location and may vary depending on job-related knowledge, skills, and experience. Amazon is a total compensation company. Depending on the position offered, equity, sign-on payments, and other forms of compensation may be provided as part of a total package, in addition to a full range of benefits. For more information, please visit https://www.aboutamazon.com/workplace/employee-benefits. This position will remain posted until filled. Applicants should apply via our internal or external career site.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.