Enable job alerts via email!

ML Research Engineer, ML Systems

Scale AI, Inc.

New York (NY)

On-site

USD 200,000 - 251,000

Full time

10 days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An innovative firm at the forefront of AI seeks a talented engineer to enhance its ML platform. In this role, you will build and optimize a distributed framework for large language model training, enabling cutting-edge research and development. Collaborating with diverse teams, you'll integrate advanced technologies to improve ML systems. This position offers a competitive salary and a comprehensive benefits package, including equity and health coverage, in a dynamic and inclusive work environment. If you're passionate about shaping the future of AI, we want to hear from you!

Benefits

Health Coverage

Retirement Plans

Learning Stipend

Paid Time Off (PTO)

Qualifications

Experience with multi-node LLM training and inference.
Strong enthusiasm for system optimization.

Responsibilities

Build, profile, and optimize training and inference framework.
Collaborate with ML teams to accelerate research.

Skills

System Optimization

Multi-node LLM Training

Distributed ML Systems

CUDA

PyTorch

Transformers

Flash Attention

Communication Skills

Tools

CUDA

PyTorch

Transformers

Flash Attention

Scale's ML platform (RLXF) team builds our internal distributed framework for large language model training and inference. The platform powers MLEs, researchers, data scientists, and operators for fast and automatic training and evaluation of LLMs, as well as data quality assessment.

Scale is positioned at the core of AI as a provider of training data, evaluation data, and end-to-end ML lifecycle solutions. You will collaborate across Scale's ML teams and researchers to develop the foundational platform supporting all our ML research and development. Your role involves building and optimizing the platform to enable next-generation LLM training, inference, and data curation.

If you are passionate about shaping the future of AI through innovation, we want to hear from you!

You will:

Build, profile, and optimize our training and inference framework
Collaborate with ML teams to accelerate research and enable the development of new models and data curation methods
Research and integrate cutting-edge technologies to enhance our ML systems

Ideally you'd have:

Strong enthusiasm for system optimization
Experience with multi-node LLM training and inference
Experience developing large-scale distributed ML systems
Proficiency in frameworks and tools like CUDA, PyTorch, transformers, flash attention, etc.
Excellent communication skills and the ability to work effectively in a cross-functional team

Nice to haves:

Expertise in post-training methods and next-generation use cases for large language models, including instruction tuning, RLHF, tool use, reasoning, agents, and multimodal applications

Compensation packages include base salary, equity, and benefits. The salary range for this position in San Francisco, New York, and Seattle is $200,800—$251,000 USD, depending on location and experience. Benefits include health coverage, retirement plans, a learning stipend, PTO, and potential additional perks.

We are committed to diversity and inclusion, providing accommodations for applicants with disabilities, and complying with pay transparency laws. Our privacy policy explains how we handle personal data collected during the application process.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Senior Security Engineer, Research & Engineering

Trail of Bits

Remote

USD 170,000 - 220,000

30+ days ago

Research Engineer, SysML - FAIR

Facebook

New York

On-site

USD 8,000 - 251,000

30+ days ago