Enable job alerts via email!

Member of Technical Staff - Pretraining / Inference Optimization Freiburg (Germany), San Franci[...]

Global Trade Plaza

Mississippi

On-site

USD 60,000 - 80,000

Full time

30+ days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

Join a pioneering startup at the forefront of generative models, where your expertise in GPU optimization and model training will be crucial. This role involves collaborating with a talented research team to enhance pretraining and inference processes, pushing the boundaries of what's possible with cutting-edge technology. You'll be involved in developing innovative strategies and kernel optimizations that maximize performance and efficiency. If you're passionate about advancing AI technology and eager to contribute to groundbreaking projects, this opportunity is perfect for you.

Qualifications

  • Experience optimizing inference and training workloads.
  • Deep understanding of GPU memory hierarchy and computation.

Responsibilities

  • Develop training strategies for various model sizes and compute loads.
  • Profile and optimize GPU operations for enhanced performance.

Skills

GPU Optimization
Model Training Strategies
Profiling and Debugging
Quantization Techniques
Kernel Optimizations
Attention Algorithms
CUDA Programming

Tools

Nsight
PyTorch

Job description

Member of Technical Staff - Pretraining / Inference Optimization

Remote | Germany | USA

Black Forest Labs is a cutting-edge startup pioneering generative image and video models. Our team, which invented Stable Diffusion, Stable Video Diffusion, and FLUX.1, is currently seeking a strong researcher/engineer to work closely with our research team on pretraining and inference optimization.

Role:

  • Finding ideal training strategies (parallelism, precision trade-offs) for a variety of model sizes and compute loads
  • Profiling, debugging, and optimizing single and multi-GPU operations using tools such as Nsight or stack trace viewers
  • Reasoning about the speed and quality trade-offs of quantization for model inference
  • Developing and improving low-level kernel optimizations for state-of-the-art inference and training
  • Innovating new ideas that bring us closer to the limits of a GPU

Ideal Experiences:

  • Being familiar with the latest and the most effective techniques in optimizing inference and training workloads
  • Optimizing for both memory-bound and compute-bound operations
  • Understanding GPU memory hierarchy and computation capabilities
  • Deep understanding of efficient attention algorithms
  • Implementing both forward and backward Triton kernels and ensuring their correctness while considering floating point errors
  • Using, for example, pybind to integrate custom-written kernels into a PyTorch framework

Nice to have:

  • Experience with Diffusion and Autoregressive models
  • Experience in low-level CUDA kernel optimizations
Apply for this job

* indicates a required field

First Name *

Last Name *

Email *

Phone

Resume/CV

Enter manually

Accepted file types: pdf, doc, docx, txt, rtf

LinkedIn Profile

Website

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.