Enable job alerts via email!

Software Engineer GPU Kernel

Scout AI

New York (NY)

Remote

USD 90,000 - 160,000

Full time

30+ days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An innovative hiring platform is seeking an expert engineer to join their R&D team in revolutionizing AI model deployment. This role involves optimizing GPU performance and leading advancements in machine learning initiatives. You will work with cutting-edge technologies, including the Mako Compiler, to enhance AI inference and training across various hardware. If you possess a strong background in CUDA, ROCm, or Triton, along with proficiency in C/C++ and Python, this is a unique opportunity to make a significant impact in a forward-thinking environment. Join a team committed to pushing the boundaries of AI technology and enjoy a range of benefits designed to support your professional growth.

Benefits

Competitive salary package

Performance-based bonuses

Comprehensive health insurance

Flexible working hours

Remote work options

Professional development opportunities

Generous vacation policy

Company-sponsored social activities

Modern work environment

Qualifications

Expertise in CUDA, ROCm, or Triton kernel optimization is essential.
Strong programming skills in C/C++ and Python are required.
Deep understanding of GPU performance optimizations is crucial.

Responsibilities

Analyze performance bottlenecks in ML training and inference.
Optimize high-performance computing kernels in Triton, CUDA, or ROCm.
Collaborate to improve existing ML compilers or frameworks.

Skills

CUDA

ROCm

Triton

C/C++ programming

Python programming

GPU performance optimization

Machine Learning (ML) models

Education

Bachelor's degree in Computer Science

Master's degree in Electrical Engineering

PhD in a related field

Tools

Mako Compiler

MLIR

Pytorch

Tensorflow

ONNX Runtime

TensorRT

Intro

Scout AIis a new hiring platform that connects software engineers to opportunities with world-class companies. On Scout, you get a more relevant and growthful interviewing experience, you receive feedback on your performance, and you also get end-to-end support to improve your chances of getting hired.

If you perform well on the Scout interview, you become eligible for opportunities with all companies in the Scout network (only complete the interview once).

This role is with our partner company that is actively hiring:
Mako

About the company

Mako's AI platform reduces AI compute costs by up to 70%

Our breakthrough technology eliminates the need for expensive and manual GPU optimization, automatically generating high-performance code that runs efficiently on any hardware. Two core capabilities drive immediate business value:

Cost Optimization : Deploy AI models with up to 70% lower computing costs, directly improving your bottom line.

Universal Deployment : Run your existing AI models at peak performance across any GPU infrastructure, eliminating vendor lock-in and scaling constraints.

Mako delivers continuous, automated performance improvements without requiring changes to your existing code or hiring specialized engineers. Our intelligent compiler automatically optimizes your AI workloads 24/7, ensuring you maintain peak efficiency as your models and infrastructure evolve.

Technical Innovation

At the core of our platform is an innovative compiler that leverages hardware-aware deep learning-based search to automatically select from the growing ecosystem of vendor-provided and open-source GPU kernel libraries. Our compiler extends beyond library selection with optimization passes for both vertical and horizontal kernel fusion, enabling the generation of novel kernels outside the original search space.

Our roadmap includes extending the compiler to generate entirely new kernels from scratch. By integrating cutting-edge AI technologies into the compilation pipeline from day one, Mako is pioneering the next generation of modern compilation.

About the role

Summary

Our R&D team is focused on creating the most efficient engine for deploying generative AI models, with efforts ranging from precise GPU kernel tuning to comprehensive system optimizations.

We're looking for an expert level engineer with a strong background in either CUDA, ROCm, or Triton kernel optimization. Your role will involve leading substantial improvements in GPU performance and playing a key role in pioneering AI and machine learning initiatives.

Our tech

Our team builds software infrastructure for high-performance AI inference and training on any hardware. There are three core components:

Mako Compiler automatically selects, tunes, and generates GPU kernels for any hardware platform
Mako Runtime serves compiled models at high performance
Mako Platform enables users to easily deploy and manage deployments across any cloud (you'll be working on this!)

Responsibilities

Explore and analyze performance bottlenecks in ML training and inference.
Develop and optimize high-performance computing kernels in Triton, CUDA, and/or ROCm.
Implement programming solutions in C/C++ and Python.
Deep dive into GPU performance optimizations to maximize efficiency and speed.
Collaborate with the team to extend and improve existing machine learning compilers or frameworks such as MLIR, Pytorch, Tensorflow, ONNX Runtime, TensorRT. (This is optional but beneficial)

Qualifications

Bachelor's, Master's or PhD's degree in Computer Science, Electrical Engineering, or a related field.
Strong programming skills in C/C++ and Python.
Deep understanding and experience in GPU performance optimizations.
Proven experience with kernel optimizations on CUDA, ROCm, or other accelerators.
General experience with the training and deployment of ML models
Experience with distributed systems development or distributed ML workloads

Bonus Points

Experience with innovative OSS projects like FlashAttention, mlc-llm, vllm.
Experience with machine learning compilers or frameworks such as TVM, MLIR, Pytorch, Tensorflow, ONNX Runtime, TensorRT.

Our Benefits

Competitive salary package
Performance-based bonuses and incentives
Comprehensive health insurance coverage for you and your family
Flexible working hours and remote work options
Professional development opportunities, including training programs and conferences
Generous vacation and paid time off policy
Company-sponsored social activities and team-building events
Modern and comfortable work environment with state-of-the-art equipment and facilities

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

GPU Software Engineer

New York

Remote

USD 130.000 - 160.000

12 days ago

Golang System Software Engineer - Containers / Virtualisation

Canonical

New York

Remote

USD 130.000 - 160.000

13 days ago

Title Senior Software Engineer

Paramount Pictures

New York

Remote

USD 120.000 - 160.000

5 days ago

Be an early applicant

Software Engineer - Rendering and Animation

Figma

New York

Remote

USD 149.000 - 350.000

7 days ago

Be an early applicant

Software Engineer GPU Kernel

Scout AI

New York (NY)

Remote

USD 90,000 - 160,000

Full time

Job summary

Benefits

Qualifications

Responsibilities

Skills

Education

Tools

Job description

Similar jobs

GPU Software Engineer

Remote

USD 94.000 - 228.000

GPU Software Engineer

Santa Clara

Remote

USD 94.000 - 228.000

Lead Software Engineer

New York

Remote

USD 150.000 - 190.000

Senior Software Design Engineer

New York

Remote

USD 150.000 - 200.000

REMOTE React Developer Software Engineer Finance New York

New York

Remote

USD 120.000 - 160.000

System Software Engineer - Ubuntu Networking

New York

Remote

USD 130.000 - 160.000

Golang System Software Engineer - Containers / Virtualisation

New York

Remote

USD 130.000 - 160.000

Title Senior Software Engineer

New York

Remote

USD 120.000 - 160.000

Software Engineer - Rendering and Animation

New York

Remote

USD 149.000 - 350.000