Attiva gli avvisi di lavoro via e-mail!

Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training - Performance Optimization

Amazon

Asti

In loco

EUR 127.000 - 221.000

Tempo pieno

2 giorni fa
Candidati tra i primi

Descrizione del lavoro

A leading technology firm seeks a Sr. Software Engineer specializing in AI/ML and performance optimization. This role involves developing and tuning large ML models and collaborating with teams for distributed training solutions. Ideal candidates should have over 5 years of software development experience and a strong programming background. The position offers competitive compensation based on location and experience.

Competenze

  • 5+ years of non-internship professional software development experience.
  • 5+ years of programming experience in one language.
  • Experience leading design or architecture for new and existing systems.

Mansioni

  • Lead efforts building distributed training and inference support.
  • Tune models for performance and efficiency on AWS hardware.
  • Collaborate on distributed training solutions with various teams.

Conoscenze

Software development experience
Programming experience
Design patterns
Mentoring and leadership

Formazione

Bachelor's degree in computer science or equivalent
Descrizione del lavoro
Overview

Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training - Performance Optimization role within AWS Utility Computing (UC) and Annapurna Labs. The role focuses on development, enablement and performance tuning of ML model training and inference on the AWS Neuron stack, including Trn1/Inf1 servers, for large-scale model families and cutting-edge cloud AI services. The candidate will work with a team to enable distributed training and inference across PyTorch, TensorFlow, and JAX using XLA and the Neuron compiler/runtime stack, and will implement and optimize using libraries such as FSDP and DeepSpeed.

This role is part of the ML Apps team that collaborates with chip architects, compiler engineers and runtime engineers to build, tune and optimize distributed training solutions for Neuron-based systems.

Key responsibilities
  • Lead efforts building distributed training and inference support into PyTorch, TensorFlow, and JAX using XLA and the Neuron stacks.
  • Tune models to achieve highest performance and efficiency on AWS Trainium and Inferentia silicon and on TRn1/Inf1 servers.
  • Collaborate with chip architects, compiler engineers and runtime engineers to create, build and tune distributed training solutions with Trn1.
  • Develop and enable support for a wide variety of ML model families (e.g., GPT-2, GPT-3 and beyond, stable diffusion, Vision Transformers, and more).
  • Experience training large models with Python and integrate distributed training libraries such as FSDP and DeepSpeed into Neuron-based systems.
Basic Qualifications
  • 5+ years of non-internship professional software development experience
  • 5+ years of programming experience in at least one software language
  • 5+ years of leading design or architecture (design patterns, reliability and scaling) of new and existing systems
  • 5+ years of full software development lifecycle, including coding standards, code reviews, source control, build processes, testing, and operations
  • Experience as a mentor, tech lead or leading an engineering team
Preferred Qualifications
  • Bachelor's degree in computer science or equivalent
  • Machine Learning knowledge in frameworks and end-to-end model training

Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.

Our inclusive culture supports accommodations for disability during the application and hiring process. For more information, visit the Amazon accommodations page. If the country/region you’re applying in isn’t listed, please contact your Recruiting Partner.

Our compensation reflects the cost of labor across several US geographic markets. The base pay range for this position is $151,300/year to $261,500/year, pay is based on location and experience. Amazon is a total compensation company; depending on the role, equity, sign-on payments and other benefits may be provided.

This position will remain posted until filled. Applicants should apply via our internal or external career site.

Posted: May 16, 2025 (Updated about 17 hours ago)

Posted: September 20, 2025 (Updated 1 day ago)

Posted: September 1, 2025 (Updated 1 day ago)

Posted: August 27, 2025 (Updated 1 day ago)

Posted: June 24, 2025 (Updated 2 days ago)

Share this job

Important FAQs for current Government employees

Before proceeding, please review the following FAQs

https://www.amazon.jobs/en/faqs#faqs-for-us-government-employees

Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.

Ottieni la revisione del curriculum gratis e riservata.
oppure trascina qui un file PDF, DOC, DOCX, ODT o PAGES di non oltre 5 MB.