Enable job alerts via email!

Senior Research Engineer - Performance Optimization

Luma AI

Palo Alto (CA)

On-site

USD 180,000 - 250,000

Full time

4 days ago
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading tech company is seeking engineers to work on cutting-edge foundation models using PyTorch and CUDA. The position involves optimizing models for efficiency, collaborating with research teams, and deploying high-performance code. Ideal candidates will have a strong grasp of large model training and distributed systems, along with a passion for AI innovation.

Benefits

Equity packages in the form of stock options
Comprehensive benefits plan

Qualifications

  • Experience training large models using Python & PyTorch.
  • Experience optimizing and deploying inference workloads.
  • Experience with profiling CPU & GPU code in PyTorch.

Responsibilities

  • Ensure efficient implementation of models & systems for data processing and deployment.
  • Identify and remedy efficiency bottlenecks in performance.
  • Build tools to visualize, evaluate and filter datasets.

Skills

Problem Solving
CUDA
Distributed Systems
PyTorch
C++
Triton

Job description

We are looking for engineers with significant problem solving experience in PyTorch, CUDA and distributed systems. You will work with Research Scientists to build & train cutting edge foundation models on thousands of GPUs.


Responsibilities
  • Ensure efficient implementation of models & systems for data processing, training, inference and deployment
  • Identify and implement optimization techniques for massively parallel and distributed systems
  • Identify and remedy efficiency bottlenecks (memory, speed, utilization) by profiling and implementing high-performance CUDA, Triton, C++ and PyTorch code
  • Work closely together with the research team to ensure systems are planned to be as efficient as possible from start to finish
  • Build tools to visualize, evaluate and filter datasets
  • Implement cutting-edge product prototypes based on multimodal generative AI
Experience
  • Experience training large models using Python & Pytorch, including practical experience working with the entire development pipeline from data processing, preparation & data loading to training and inference.
  • Experience optimizing and deploying inference workloads for throughput and latency across the stack (inputs, model inference, outputs, parallel processing etc.)
  • Experience with profiling CPU & GPU code in PyTorch, including Nvidia Nsight or similar.
  • Experience writing & improving highly parallel & distributed PyTorch code, with familiarity in DDP, FSDP, Tensor Parallel, etc.
  • Experience writing high-performance parallel C++. Bonus if done within an ML context with PyTorch, like for data loading, data processing, inference code.
  • Experience with high-performance Triton / CUDA and writing custom PyTorch kernels. Top candidates will be able to utilize tensor cores; optimize performance with CUDA memory and other similar skills.
  • Good to have experience working with Deep learning concepts such as Transformers & Multimodal Generative models such as Diffusion Models and GANs.
  • Good to have experience building inference / demo prototype code (incl. Gradio, Docker etc.)
Compensation
  • The pay range for this position in California is $180,000 - $250,000yr; however, base pay offered may vary depending on job-related knowledge, skills, candidate location, and experience. We also offer competitive equity packages in the form of stock options and a comprehensive benefits plan.

$200,000 - $280,000 a year
In addition to cash base pay, you'll also receive a sizable grant of Luma's equity.
The pay range for this position is for Bay Area. Base pay offered may vary depending on job-related knowledge, skills, candidate location, and experience.

Your applications are reviewed by real people.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Senior Performance Testing Engineer

Applicantz

Remote

USD 170,000 - 720,000

3 days ago
Be an early applicant

Senior Staff AI Performance Engineer

Energy Vault

San Francisco

Hybrid

USD 220,000 - 290,000

6 days ago
Be an early applicant

Senior AI Performance Engineer

Energy Vault

San Francisco

Hybrid

USD 183,000 - 210,000

6 days ago
Be an early applicant

GenAI Staff Machine Learning Engineer, Performance Optimization

Databricks Inc.

San Francisco

On-site

USD 192,000 - 260,000

Yesterday
Be an early applicant

AI Agent Software Engineer - Agent Performance Engineering

Assembled

San Francisco

Hybrid

USD 150,000 - 300,000

7 days ago
Be an early applicant

Senior Performance Software Engineer, Deep Learning Libraries

NVIDIA Corporation

Santa Clara

On-site

USD 184,000 - 426,000

3 days ago
Be an early applicant

Performance engineer

writer.com

San Francisco

On-site

USD 150,000 - 200,000

12 days ago

System Performance Engineer - Server Platform

TikTok

San Jose

On-site

USD 145,000 - 355,000

9 days ago

Plasma Physicist

Maritime Fusion

San Francisco

On-site

USD 120,000 - 200,000

6 days ago
Be an early applicant