Enable job alerts via email!

Research Engineer - Performance Optimization

The Rundown AI, Inc.

Palo Alto (CA)

On-site

USD 180,000 - 250,000

Full time

5 days ago
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading company specializing in cutting-edge AI technologies is seeking engineers with expertise in PyTorch and CUDA. The role involves optimizing and implementing high-performance solutions for large-scale AI models while collaborating with research scientists. Competitive compensation with extensive benefits is offered.

Benefits

Comprehensive benefits plan
Competitive equity packages in stock options

Qualifications

  • Experience training large models using Python and PyTorch.
  • Experience optimizing and deploying inference workloads.
  • Experience with profiling CPU and GPU code in PyTorch.

Responsibilities

  • Ensure efficient implementation of models & systems for data processing and deployment.
  • Identify and implement optimization techniques for distributed systems.
  • Build tools to visualize, evaluate and filter datasets.

Skills

Problem solving
CUDA
Parallel processing
High-performance coding
PyTorch

Tools

Nvidia Nsight
CUDA
Triton
Docker

Job description

We are looking for engineers with significant problem solving experience in PyTorch, CUDA and distributed systems. You will work with Research Scientists to build & train cutting edge foundation models on thousands of GPUs.

Responsibilities

  • Ensure efficient implementation of models & systems for data processing, training, inference and deployment

  • Identify and implement optimization techniques for massively parallel and distributed systems

  • Identify and remedy efficiency bottlenecks (memory, speed, utilization) by profiling and implementing high-performance CUDA, Triton, C++ and PyTorch code

  • Work closely together with the research team to ensure systems are planned to be as efficient as possible from start to finish

  • Build tools to visualize, evaluate and filter datasets

  • Implement cutting-edge product prototypes based on multimodal generative AI

Experience

  • Experience training large models using Python & Pytorch, including practical experience working with the entire development pipeline from data processing, preparation & data loading to training and inference.

  • Experience optimizing and deploying inference workloads for throughput and latency across the stack (inputs, model inference, outputs, parallel processing etc.)

  • Experience with profiling CPU & GPU code in PyTorch, including Nvidia Nsight or similar.

  • Experience writing & improving highly parallel & distributed PyTorch code, with familiarity in DDP, FSDP, Tensor Parallel, etc.

  • Experience writing high-performance parallel C++. Bonus if done within an ML context with PyTorch, like for data loading, data processing, inference code.

  • Experience with high-performance Triton / CUDA and writing custom PyTorch kernels. Top candidates will be able to utilize tensor cores; optimize performance with CUDA memory and other similar skills.

  • Good to have experience working with Deep learning concepts such as Transformers & Multimodal Generative models such as Diffusion Models and GANs.

  • Good to have experience building inference / demo prototype code (incl. Gradio, Docker etc.)

Compensation

  • The pay range for this position in California is $180,000 - $250,000yr; however, base pay offered may vary depending on job-related knowledge, skills, candidate location, and experience. We also offer competitive equity packages in the form of stock options and a comprehensive benefits plan.

Your applications are reviewed by real people.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

AI/ML Application Performance Engineer

Cornelis Networks

Chesterbrook

Remote

USD 127,000 - 184,000

10 days ago

Performance engineer

writer.com

San Francisco

On-site

USD 150,000 - 200,000

Today
Be an early applicant

Application Performance Engineer

ZipRecruiter

Redwood City

On-site

USD 150,000 - 200,000

7 days ago
Be an early applicant

Senior DGX Cloud Performance Engineer

NVIDIA

Santa Clara

On-site

USD 224,000 - 426,000

6 days ago
Be an early applicant

Senior Performance Software Engineer, Deep Learning Libraries

California Jobs

San Mateo

On-site

USD 184,000 - 426,000

Today
Be an early applicant

Software Engineer, Performance Optimization

Fireworks AI

New York

On-site

USD 175,000 - 190,000

10 days ago

Senior AI Performance Engineer

ZipRecruiter

San Francisco

Hybrid

USD 205,000 - 240,000

10 days ago

Performance Engineer / Compiler Optimization - Crypto startup

Jobot

Redding

Remote

USD 175,000 - 500,000

30+ days ago

Performance Engineer / Compiler Optimization - Crypto startup

Jobot

Reno

Remote

USD 175,000 - 500,000

30+ days ago