Enable job alerts via email!

Software Engineer - AI Evals and Test

P-1 AI

United States

Remote

USD 100,000 - 150,000

Full time

30 days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

Join a leading company in AI development as a Software Engineer focused on evals and testing. You will be responsible for developing benchmarks for AI systems, collaborating with experts, and ensuring effective testing processes. This remote role offers the opportunity to impact the future of engineering AI while working with a talented team.

Qualifications

  • Experience constructing comprehensive test suites for software and AI systems.
  • Experience designing metrics to evaluate and visualize system performance.
  • Proficiency in Python and modern development tools.

Responsibilities

  • Implement systems for organizing and running eval benchmarks.
  • Collaborate with partners to gather and refine evals.
  • Lead implementation of automated tests across technology stacks.

Skills

Test suite construction
System performance metrics
Communication skills
Proficiency in Python
CI/CD practices
Ability to thrive in startup

Job description

Join to apply for the Software Engineer - AI Evals and Test role at P-1 AI

We are building an engineering AGI. We founded P-1 AI with the conviction that the greatest impact of artificial intelligence will be on the built world—helping mankind conquer nature and bend it to our will. Our first product is Archie, an AI engineer capable of quantitative and spatial reasoning over physical product domains that performs at the level of an entry-level design engineer. We aim to put an Archie on every engineering team at every industrial company on earth.

Our founding team includes top minds in model-based engineering, deep learning, and industries that are our customers. We recently closed a $23 million seed round led by Radical Ventures and are building AI's most impactful use case. Join our team of world-class engineers and AI researchers to make a significant impact.

About The Role

In this role, you’ll be responsible for developing evals to ensure Archie is learning and retaining necessary skills, benchmarking it against industry standards. Working within a small, high-performing team, you'll define, implement, and validate these evals, incorporating input from engineering experts and industrial partners. You will also translate these evals into multiple formats for use with various AI and non-AI systems and agents.

This is a remote role, based anywhere in the US or Canada with existing work authorization. You are expected to travel to our San Francisco office approximately once every six weeks for co-working sessions. If you're located in the SF Bay Area or interested in relocating, you may work from our SF office.

Responsibilities
  • Implement systems for organizing, transforming, running, grading, and reporting on eval benchmarks.
  • Ensure evals run effectively within our CI/CD system, continuously benchmarking our evolving AI platform.
  • Collaborate with industrial partners, AI teams, and engineering experts to gather and refine evals.
  • Create methods to detect and test for common AI quality issues, such as hallucinations, stochasticity, and regressions.
  • Lead in the consistent implementation and organization of automated tests across our technology stacks.
Skills
  • Experience constructing comprehensive test suites for software and AI systems.
  • Experience designing metrics to evaluate and visualize system performance, including across generations.
  • Experience with evals against LLM-based systems is a strong plus.
  • Good communication skills with diverse stakeholders.
  • Proficiency in Python, modern development tools, and practices (Git, CI/CD).
  • Ability to thrive in a fast-paced startup environment.
Interview Process
  • Initial screening with Head of Talent (30 mins)
  • Hiring manager interview with co-founder & Head of Engineering (45 mins)
  • Programming interview with technical staff & Head of Engineering (60 mins)
  • Culture fit / Q&A with co-founder & CEO (45 mins)
Additional Details
  • Seniority level: Entry level
  • Employment type: Full-time
  • Job function: Engineering and IT
  • Industry: Software Development

Referrals increase your chances of interviewing at P-1 AI by 2x. Sign in to set job alerts for “Software Engineer in Test” roles. Salary range in the US: $100,000 - $150,000, posted 1 month ago.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

AI & Machine Learning Engineer - Manager - Consulting - Open Location

EY

Providence null

Remote

Remote

USD 124,000 - 228,000

Full time

6 days ago
Be an early applicant

AI & Machine Learning Engineer - Senior - Consulting - Open Location

EY

Providence null

Remote

Remote

USD 105,000 - 175,000

Full time

6 days ago
Be an early applicant

Software Engineer AI, Senior

BOOZ ALLEN HAMILTON INTERNATIONAL (U.K.) LTD

McLean null

Remote

Remote

USD 86,000 - 198,000

Full time

6 days ago
Be an early applicant

AI & Machine Learning Engineer - Senior - Consulting - Open Location

EY

Baltimore null

Remote

Remote

USD 105,000 - 175,000

Full time

6 days ago
Be an early applicant

Senior Software Engineer, AI & Data

Fleetio

null null

Remote

Remote

USD 110,000 - 160,000

Full time

7 days ago
Be an early applicant

AI & Machine Learning Engineer – Manager – Consulting – Open Location

NLP PEOPLE

Chattanooga null

Remote

Remote

USD 124,000 - 228,000

Full time

Today
Be an early applicant

AI & Machine Learning Engineer - Manager - Consulting - Open Location

EY

Atlanta null

Remote

Remote

USD 124,000 - 228,000

Full time

5 days ago
Be an early applicant

AI & Machine Learning Engineer - Senior - Consulting - Open Location

EY

Alpharetta null

Remote

Remote

USD 105,000 - 175,000

Full time

5 days ago
Be an early applicant

Principal AI/SOTIF Safety Engineer Technical Lead

General Motors

null null

Remote

Remote

USD 120,000 - 180,000

Full time

Today
Be an early applicant