Enable job alerts via email!

AI Training Infrastructure Engineer - Post Training

Tbwa Chiat/Day Inc

San Francisco (CA)

Hybrid

USD 220,000 - 290,000

Full time

30+ days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An innovative firm is seeking experienced AI Research Engineers and Scientists to enhance their in-house Online LLMs. In this exciting role, you will develop a robust training framework using advanced technologies like Megatron and PyTorch. Your contributions will be pivotal in scaling model training jobs and integrating cutting-edge AI models into products. Join a rapidly growing team that has revolutionized conversational AI, and be part of a dynamic environment where your expertise will drive impactful solutions. If you're passionate about AI and eager to tackle challenging problems, this opportunity is perfect for you.

Benefits

Comprehensive health insurance
Dental insurance
Vision insurance
401(k) plan
Equity options

Qualifications

  • 6+ years of experience with large-scale LLM frameworks.
  • Strong expertise in Python and PyTorch for model training.

Responsibilities

  • Build a post-training framework for large-scale model training.
  • Implement infrastructure for the latest models and algorithms.

Skills

Python
PyTorch
C++
CUDA
Problem-solving

Education

PhD in AI/ML/Systems or related areas

Tools

Megatron

Job description

Perplexity is seeking experienced AI Research Engineers and Scientists to continue to improve our in-house Online LLMs, the Sonar models. Your job is to work with the team and create a robust and effective training framework (on top of Megatron/PyTorch), especially for post-training LLMs.

Responsibilities

  • Build a post-training framework that can run cutting-edge model training jobs at scale.
  • Implement the necessary infrastructure and components to support the latest models and algorithms like SFT, RL (DPO/GRPO), and more.
  • Own the full stack data, training, and evaluation pipelines required to post-train LLM models.
  • Work closely with engineering teams to integrate Sonar models into our product.

Qualifications

  • Proven experience with large-scale LLM frameworks building.
  • Strong in Python/PyTorch; C++/CUDA is a plus.
  • Self-starter with a willingness to take ownership of tasks.
  • Passion for tackling challenging problems.
  • Minimum of 6 years of working on relevant projects.

Bonus

  • PhD in AI/ML/Systems or related areas.
  • Experience building LLM training frameworks, especially post-training.

The cash compensation range for this role is $220,000 - $290,000.

At Perplexity, we've experienced tremendous growth and adoption since publicly launching the world's first fully functional conversational answer engine in 2022. We've grown from answering 2.5 million questions per day at the start of 2024 to around 20 million daily queries in December 2024.

Final offer amounts are determined by multiple factors, including experience and expertise, and may vary from the amounts listed above.

Equity: In addition to the base salary, equity is part of the total compensation package.
Benefits: Comprehensive health, dental, and vision insurance for you and your dependents. Includes a 401(k) plan.

Apply for this job

* indicates a required field

First Name *

Last Name *

Email *

Phone *

Resume/CV *

Website

LinkedIn Profile

Will you now or in the future require visa sponsorship for employment? * Select...

Are you open to a hybrid work schedule (In office M-W-F) * Select...

Are you located in the San Francisco Bay? * Select...

If you do not live in the San Francisco Bay Area, are you willing to relocate? Select...

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

AI Training Infrastructure Engineer - Post Training

ZipRecruiter

San Francisco

On-site

USD 220,000 - 290,000

Today
Be an early applicant

AI Infrastructure Engineer, ML Data Platform

Scale AI, Inc.

California

On-site

USD 188,000 - 226,000

10 days ago

Software Engineer, Training Infrastructure

Google

Mountain View

Remote

USD 189,000 - 350,000

30+ days ago