Enable job alerts via email!

Member of Technical Staff - Pretraining / Inference Optimization Freiburg (Germany), San Franci[...]

Global Trade Plaza

Mississippi

On-site

USD 60,000 - 80,000

Full time

30+ days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

Join a pioneering startup at the forefront of generative models, where your expertise in GPU optimization and model training will be crucial. This role involves collaborating with a talented research team to enhance pretraining and inference processes, pushing the boundaries of what's possible with cutting-edge technology. You'll be involved in developing innovative strategies and kernel optimizations that maximize performance and efficiency. If you're passionate about advancing AI technology and eager to contribute to groundbreaking projects, this opportunity is perfect for you.

Qualifications

Experience optimizing inference and training workloads.
Deep understanding of GPU memory hierarchy and computation.

Responsibilities

Develop training strategies for various model sizes and compute loads.
Profile and optimize GPU operations for enhanced performance.

Skills

GPU Optimization

Model Training Strategies

Profiling and Debugging

Quantization Techniques

Kernel Optimizations

Attention Algorithms

CUDA Programming

Tools

Nsight

PyTorch

Member of Technical Staff - Pretraining / Inference Optimization

Remote | Germany | USA

Black Forest Labs is a cutting-edge startup pioneering generative image and video models. Our team, which invented Stable Diffusion, Stable Video Diffusion, and FLUX.1, is currently seeking a strong researcher/engineer to work closely with our research team on pretraining and inference optimization.

Role:

Finding ideal training strategies (parallelism, precision trade-offs) for a variety of model sizes and compute loads
Profiling, debugging, and optimizing single and multi-GPU operations using tools such as Nsight or stack trace viewers
Reasoning about the speed and quality trade-offs of quantization for model inference
Developing and improving low-level kernel optimizations for state-of-the-art inference and training
Innovating new ideas that bring us closer to the limits of a GPU

Ideal Experiences:

Being familiar with the latest and the most effective techniques in optimizing inference and training workloads
Optimizing for both memory-bound and compute-bound operations
Understanding GPU memory hierarchy and computation capabilities
Deep understanding of efficient attention algorithms
Implementing both forward and backward Triton kernels and ensuring their correctness while considering floating point errors
Using, for example, pybind to integrate custom-written kernels into a PyTorch framework

Nice to have:

Experience with Diffusion and Autoregressive models
Experience in low-level CUDA kernel optimizations

Apply for this job

* indicates a required field

First Name *

Last Name *

Email *

Phone

Resume/CV

Enter manually

Accepted file types: pdf, doc, docx, txt, rtf

LinkedIn Profile

Website

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.