Enable job alerts via email!

Sr. ML Kernel Performance Engineer, AWS Neuron, Annapurna Labs

Amazon Web Services (AWS)

Toronto

On-site

CAD 100,000 - 130,000

Full time

Yesterday

Be an early applicant

Job summary

A leading technology company in Toronto is seeking a Senior ML Kernel Performance Engineer to craft high-performance kernels for machine learning workloads on custom accelerators. The ideal candidate will have extensive software development experience, expertise in ML or HPC architectures, and strong mentoring capabilities. This role offers the opportunity to work on cutting-edge technology and includes a supportive team culture with a focus on diversity and personal growth.

Benefits

Work/Life balance with flexible hours

Mentorship and career growth opportunities

Qualifications

5+ years of non-internship professional software development experience.
Experience as a mentor, tech lead, or leading an engineering team.
Expertise in accelerator architectures for ML or HPC.

Responsibilities

Design and implement high-performance compute kernels for ML operations.
Analyze and optimize kernel-level performance across multiple generations of Neuron hardware.
Conduct detailed performance analysis using profiling tools.

Skills

Software development experience

Programming (any language)

Design systems

Mentoring or leading teams

Education

Bachelor’s degree in computer science or equivalent

Tools

CUDA

OpenCL

TensorFlow

PyTorch

Overview

Sr. ML Kernel Performance Engineer, AWS Neuron, Annapurna Labs

The Annapurna Labs team at AWS builds AWS Neuron, the software development kit used to accelerate deep learning and GenAI workloads on Amazon’s custom machine learning accelerators, Inferentia and Trainium. The Acceleration Kernel Library team focuses on maximizing performance for AWS’s custom ML accelerators. This role involves crafting high-performance kernels for ML functions at the hardware-software boundary to ensure optimal performance for demanding workloads. You will work across frameworks, compilers, runtime, and collectives, contributing to future architecture designs and customer enablement. This is an opportunity to work at the intersection of machine learning, high-performance computing, and distributed architectures, shaping the future of AI acceleration technology.

This is a chance to work on cutting-edge products, architect and implement business-critical features, publish research, and mentor engineers in a small, agile team that values experimentation and learning. The team collaborates closely with customers on model enablement, providing optimization expertise for ML workloads on AWS accelerators.

Responsibilities

Design and implement high-performance compute kernels for ML operations, leveraging the Neuron architecture and programming models
Analyze and optimize kernel-level performance across multiple generations of Neuron hardware
Conduct detailed performance analysis using profiling tools to identify and resolve bottlenecks
Implement compiler optimizations such as fusion, sharding, tiling, and scheduling
Work directly with customers to enable and optimize their ML models on AWS accelerators
Collaborate across teams to develop innovative kernel optimization techniques

A day in the life

As you design and code solutions to drive efficiencies in software architecture, you’ll create metrics, implement automation and other improvements, and resolve root causes of software defects. You’ll build high-impact solutions for a large customer base, participate in design discussions and code reviews, and work cross-functionally to drive business decisions with your technical input. You’ll thrive in a startup-like development environment focused on the most important work.

About The Team

Diversity of experiences is valued; candidates not meeting every qualification are encouraged to apply.
Why AWS: AWS is a leading cloud platform trusted by startups to Global 500 companies.
Inclusive team culture with employee affinity groups and leadership principles guiding collaboration.
Work / Life balance with flexible hours.
Mentorship and career growth opportunities.

Basic Qualifications

5+ years of non-internship professional software development experience
5+ years of programming with at least one programming language
5+ years of leading design or architecture of systems
Experience as a mentor, tech lead, or leading an engineering team

Preferred Qualifications

5+ years of full software development lifecycle experience
Bachelor’s degree in computer science or equivalent
Expertise in accelerator architectures for ML or HPC (GPUs, CPUs, FPGAs, or custom)
Experience with GPU kernels and backends (CUDA, OpenCL, SYCL, ROCm, etc.)
Experience with NVIDIA PTX and / or AMD GPU ISA
Experience developing high performance libraries for HPC
Proficiency in low-level GPU performance optimization
Experience with LLVM / MLIR backend development for GPUs
Knowledge of ML frameworks (PyTorch, TensorFlow) and their GPU backends
Experience with parallel programming and optimization techniques
Understanding of GPU memory hierarchies and optimization strategies

Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status. If you require a workplace accommodation during the application or hiring process, please visit amazon.jobs / accommodations for more information.

Company - Amazon Development Centre Canada ULC

Seniority level

Mid-Senior level

Employment type

Full-time

Job function

Information Technology, Consulting, and Engineering

Industries

IT Services and IT Consulting

Referrals increase your chances of interviewing at Amazon Web Services (AWS) by 2x. Get notified about new Senior Performance Engineer jobs in Toronto, Ontario, Canada.

J-18808-Ljbffr

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Top cities

Top companies

Popular jobs

Sr. ML Kernel Performance Engineer, AWS Neuron, Annapurna Labs

Amazon Web Services (AWS)

Toronto

On-site

CAD 100,000 - 130,000