Enable job alerts via email!

Machine Learning Engineer, GenAI Quality

Scale AI

New York, San Francisco (IA, CA)

On-site

USD 172,000 - 300,000

Full time

23 days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An established industry player is seeking a Machine Learning Engineer to join its innovative team focused on Generative AI. This role involves designing and fine-tuning large language models to automate data quality evaluation and generation. You will collaborate with cross-functional teams to build scalable ML services that enhance training data quality, directly impacting the development of cutting-edge AI solutions. If you're passionate about shaping the future of AI and enjoy working in a dynamic environment, this opportunity is perfect for you. Join a team that values inclusivity and innovation while driving significant advancements in the AI landscape.

Benefits

Comprehensive health coverage
Dental and vision coverage
Retirement benefits
Learning and development stipend
Generous PTO
Commuter stipend

Qualifications

  • 3+ years of experience in ML model design and deployment.
  • Strong background in NLP and deep learning frameworks.

Responsibilities

  • Design and evaluate large language models for data generation.
  • Develop frameworks to assess performance across critical dimensions.

Skills

Machine Learning
Natural Language Processing (NLP)
Deep Learning
Python
Communication Skills

Education

Bachelor's Degree in Computer Science or related field

Tools

PyTorch
TensorFlow
JAX
AWS
GCP

Job description

Machine Learning Engineer, GenAI Quality

About Scale:
Scale’s Generative AI ML team develops models and services to power high-quality data generation and evaluation for the most advanced large language models on earth. We also conduct applied research on model supervision and algorithmic approaches that support frontier models for Scale’s applied-ML teams and the broader AI community. Scale is uniquely positioned at the center of the AI ecosystem as a leading provider of training and evaluation data, end-to-end ML lifecycle solutions, and frontier evaluations for public and private institutions.

About The Role:
This role focuses on developing ML systems to automate data quality evaluation and generation using large language models. You’ll build scalable systems to assess quality across accuracy, instruction adherence, factuality, and reasoning — and design robust evaluation frameworks to ensure alignment with human standards. This is one of the highest impact areas in the company and directly accelerates the development of aligned, performant foundation models.

You’ll be deeply involved in the full lifecycle: from model design and fine-tuning, to prototyping, deployment, and monitoring. You’ll partner closely with engineering, research, and product teams to deliver cutting-edge solutions for both customers and internal GenAI data engines — Scale’s fastest-growing business.

If you’re excited about combining human-machine evaluation, scaling high-quality training data, and shaping the next generation of foundation models, we’d love to hear from you.

You will:

  • Design, fine-tune, and evaluate large language models for structured quality evaluation and data generation tasks
  • Develop robust evaluation frameworks to assess performance across accuracy, instruction following, reasoning, and other critical dimensions
  • Build and maintain scalable ML services to automatically assess and generate high-quality training and evaluation data
  • Research and apply state-of-the-art techniques in LLM training, post-training alignment (e.g., instruction tuning, RLHF), and tool-augmented reasoning
  • Collaborate with research scientists, engineers, and product teams to integrate your work into production services used by top AI developers

Ideally you’d have:

  • 3+ years of experience designing, training, and deploying ML models in production environments
  • Strong background in NLP, LLMs, and deep learning frameworks like PyTorch, TensorFlow, or JAX
  • Experience building microservices and deploying ML pipelines in cloud environments (e.g., AWS or GCP)
  • Practical knowledge of LLM fine-tuning and evaluation for tasks like factuality, instruction adherence, and chain-of-thought reasoning
  • Strong programming skills (e.g., Python) and a solid foundation in algorithms and data structures
  • Strong communication skills and experience working cross-functionally

Nice to haves:

  • Experience with post-training LLM techniques (instruction tuning, RLHF, tool use, or agent-based reasoning)
  • Familiarity with data evaluation pipelines, dataset curation, or scalable annotation workflows
  • Background in multimodal ML or model evaluation across domains such as code or long-context generation

Compensation: Compensation packages at Scale for eligible roles include base salary, equity, and benefits. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position, determined by work location and additional factors, including job-related skills, experience, interview performance, and relevant education or training. Scale employees in eligible roles are also granted equity-based compensation, subject to Board of Director approval. Your recruiter can share more about the specific salary range for your preferred location during the hiring process, and confirm whether the hired role will be eligible for equity grant. You’ll also receive benefits including, but not limited to: Comprehensive health, dental and vision coverage, retirement benefits, a learning and development stipend, and generous PTO. Additionally, this role may be eligible for additional benefits such as a commuter stipend.

Salary Range: Please reference the job posting's subtitle for where this position will be located. For pay transparency purposes, the base salary range for this full-time position in the locations of San Francisco, New York, Seattle is: $172,000 - $300,000 USD.

About Us:
At Scale, we believe that the transition from traditional software to AI is one of the most important shifts of our time. Our mission is to make that happen faster across every industry, and our team is transforming how organizations build and deploy AI. Our products power the world's most advanced LLMs, generative models, and computer vision models. We are trusted by generative AI companies such as OpenAI, Meta, and Microsoft, government agencies like the U.S. Army and U.S. Air Force, and enterprises including GM and Accenture. We are expanding our team to accelerate the development of AI applications.

We believe that everyone should be able to bring their whole selves to work, which is why we are proud to be an inclusive and equal opportunity workplace. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability status, gender identity or Veteran status.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.