Enable job alerts via email!

Sr. ML Kernel Performance Engineer, AWS Neuron, Annapurna Labs

Amazon Web Services (AWS)

Toronto

On-site

CAD 100,000 - 130,000

Full time

Yesterday
Be an early applicant

Job summary

A leading technology company in Toronto is seeking a Senior ML Kernel Performance Engineer to craft high-performance kernels for machine learning workloads on custom accelerators. The ideal candidate will have extensive software development experience, expertise in ML or HPC architectures, and strong mentoring capabilities. This role offers the opportunity to work on cutting-edge technology and includes a supportive team culture with a focus on diversity and personal growth.

Benefits

Work/Life balance with flexible hours
Mentorship and career growth opportunities

Qualifications

  • 5+ years of non-internship professional software development experience.
  • Experience as a mentor, tech lead, or leading an engineering team.
  • Expertise in accelerator architectures for ML or HPC.

Responsibilities

  • Design and implement high-performance compute kernels for ML operations.
  • Analyze and optimize kernel-level performance across multiple generations of Neuron hardware.
  • Conduct detailed performance analysis using profiling tools.

Skills

Software development experience
Programming (any language)
Design systems
Mentoring or leading teams

Education

Bachelor’s degree in computer science or equivalent

Tools

CUDA
OpenCL
TensorFlow
PyTorch
Job description
Overview

Sr. ML Kernel Performance Engineer, AWS Neuron, Annapurna Labs

The Annapurna Labs team at AWS builds AWS Neuron, the software development kit used to accelerate deep learning and GenAI workloads on Amazon’s custom machine learning accelerators, Inferentia and Trainium. The Acceleration Kernel Library team focuses on maximizing performance for AWS’s custom ML accelerators. This role involves crafting high-performance kernels for ML functions at the hardware-software boundary to ensure optimal performance for demanding workloads. You will work across frameworks, compilers, runtime, and collectives, contributing to future architecture designs and customer enablement. This is an opportunity to work at the intersection of machine learning, high-performance computing, and distributed architectures, shaping the future of AI acceleration technology.

This is a chance to work on cutting-edge products, architect and implement business-critical features, publish research, and mentor engineers in a small, agile team that values experimentation and learning. The team collaborates closely with customers on model enablement, providing optimization expertise for ML workloads on AWS accelerators.

Responsibilities
  • Design and implement high-performance compute kernels for ML operations, leveraging the Neuron architecture and programming models
  • Analyze and optimize kernel-level performance across multiple generations of Neuron hardware
  • Conduct detailed performance analysis using profiling tools to identify and resolve bottlenecks
  • Implement compiler optimizations such as fusion, sharding, tiling, and scheduling
  • Work directly with customers to enable and optimize their ML models on AWS accelerators
  • Collaborate across teams to develop innovative kernel optimization techniques
A day in the life

As you design and code solutions to drive efficiencies in software architecture, you’ll create metrics, implement automation and other improvements, and resolve root causes of software defects. You’ll build high-impact solutions for a large customer base, participate in design discussions and code reviews, and work cross-functionally to drive business decisions with your technical input. You’ll thrive in a startup-like development environment focused on the most important work.

About The Team
  • Diversity of experiences is valued; candidates not meeting every qualification are encouraged to apply.
  • Why AWS: AWS is a leading cloud platform trusted by startups to Global 500 companies.
  • Inclusive team culture with employee affinity groups and leadership principles guiding collaboration.
  • Work / Life balance with flexible hours.
  • Mentorship and career growth opportunities.
Basic Qualifications
  • 5+ years of non-internship professional software development experience
  • 5+ years of programming with at least one programming language
  • 5+ years of leading design or architecture of systems
  • Experience as a mentor, tech lead, or leading an engineering team
Preferred Qualifications
  • 5+ years of full software development lifecycle experience
  • Bachelor’s degree in computer science or equivalent
  • Expertise in accelerator architectures for ML or HPC (GPUs, CPUs, FPGAs, or custom)
  • Experience with GPU kernels and backends (CUDA, OpenCL, SYCL, ROCm, etc.)
  • Experience with NVIDIA PTX and / or AMD GPU ISA
  • Experience developing high performance libraries for HPC
  • Proficiency in low-level GPU performance optimization
  • Experience with LLVM / MLIR backend development for GPUs
  • Knowledge of ML frameworks (PyTorch, TensorFlow) and their GPU backends
  • Experience with parallel programming and optimization techniques
  • Understanding of GPU memory hierarchies and optimization strategies

Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status. If you require a workplace accommodation during the application or hiring process, please visit amazon.jobs / accommodations for more information.

Company - Amazon Development Centre Canada ULC

Seniority level

  • Mid-Senior level

Employment type

  • Full-time

Job function

  • Information Technology, Consulting, and Engineering

Industries

  • IT Services and IT Consulting

Referrals increase your chances of interviewing at Amazon Web Services (AWS) by 2x. Get notified about new Senior Performance Engineer jobs in Toronto, Ontario, Canada.

J-18808-Ljbffr

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs