Enable job alerts via email!

Software Architect - Agentic Evals

Datagrid AI

Alameda (CA)

Remote

USD 200,000 - 240,000

Full time

4 days ago
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading AI startup is seeking a seasoned Software Engineer to develop test harnesses for evaluating AI agents. This fully remote role requires expertise in B2B software engineering and proficiency in Node.js, along with a strong background in databases and cloud services. Competitive salary and comprehensive benefits including equity and health coverage are offered.

Benefits

Equity
100% covered medical, dental and vision
401k

Qualifications

  • 10+ years of B2B software engineering experience.
  • Proficiency with Node.js and frameworks like NestJS or NextJS.
  • Experience with databases such as Weaviate and BigQuery.

Responsibilities

  • Work closely with an ex-Googler to create a harness for evaluating Agent performance.
  • Influence and contribute to the extension of Datagrid’s Agentic capabilities.
  • Integrate publicly available benchmarks into the testing system.

Skills

B2B software engineering
Effective LLM prompts
Node.js
Server-side frameworks
Databases
GCP

Job description

Fully remote, with the exception of occasional meetings in San Francisco to collaborate.

Bay Area residency required.

We believe that everyone deserves their own personal army of AI helpers with deep access to company data to automate any task. Datagrid ingests business data continuously from 100+ sources, makes it all available to AI, and eliminates grunt-work such as categorizing 10k support tickets in minutes.

We are a Series-A startup headquartered in San Francisco, but operate as a distributed company. We offer competitive salaries and health benefits, along with equity and respect for work / life balance.

Join our tight-knit team that ships fast and pushes the boundaries of AI! In the last few months, our agents learned to use Microsoft Teams, write SQL queries, and automate tasks on complex schedules like “MWF at half past 9”. Our Agents live where people work (Slack, Microsoft Teams, etc.) and automatically take useful actions like producing safety reports from worksite photos.

Responsibilities

Datagrid Agents operate where our customers work- across Teams, Slack, and even SMS. Agents make multistep plans, leverage vectorized data from 100+ sources, use tools like Docusign, and manipulate the Datagrid app. We cannot possibly test this all manually.

Your job will be to :

  • Work closely with an ex-Googler who built Gemini evals to create a harness for evaluating Agent performance, make that harness available both for local development and in CI / CD pipelines, and set up alerting for when Agents misbehave.
  • Influence and contribute to the extension of Datagrid’s Agentic capabilities.
  • Choose the best open / closed source components to build out the testing infra.
  • Integrate publicly available benchmarks such as RAGBench into the testing system.
  • Grant subject matter experts the ability to add to the test library using customer queries, manually authored cases, and synthetically generated questions.
  • Expose evaluation performance so the company can track improvement over time.

Desired Experience

  • Proven track record of building test harnesses for Chat Agents from 0 ⇒ 1.
  • 10+ years of B2B software engineering experience.
  • Ability to write effective LLM prompts without assistance.
  • Proficiency with nodejs and server side frameworks such as NestJS or NextJS.
  • Experience with databases such as Weaviate and BigQuery.
  • Experience working with GCP or similar cloud providers.

Salary Range : $200k - $240k

Equity

100% covered medical, dental and vision

401k

All candidates for this role will be asked the following interview question : “Work with me to design a system to evaluate the Agent’s performance at SQL queries.” We don’t expect you to have the perfect answer, but will evaluate you on your ability to clearly explain your thinking.

Create a job alert for this search
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Platform Architect - AWS

Quantiphi

Remote

USD 140.000 - 225.000

3 days ago
Be an early applicant

Chief Architect DevSecOps

Lockheed Martin

Bethesda

Remote

USD 157.000 - 315.000

7 days ago
Be an early applicant

Principal AI Architect

goodfin

San Francisco

On-site

USD 180.000 - 350.000

2 days ago
Be an early applicant

Machine Learning Architect (AWS)

Rackspace Technology

Remote

USD 153.000 - 245.000

10 days ago

Machine Learning Architect (AWS)

Rackspace Technology

San Francisco

Remote

USD 183.000 - 245.000

30+ days ago

Technical Architect - Generative AI

Tata Consultancy Services

San Francisco

On-site

USD 145.000 - 232.000

6 days ago
Be an early applicant

Senior AIML Platform Engineer

Women In Bio

South San Francisco

On-site

USD 149.000 - 249.000

5 days ago
Be an early applicant

Principal AI Architect / Head of AI / Machine Learning Architect (LLM Expert)

Jenn Nguyen and Friends

San Francisco

Hybrid

USD 170.000 - 220.000

6 days ago
Be an early applicant

Principal AI Architect / Head of AI / Machine Learning Architect (LLM Expert)

Jenn Nguyen and Friends

San Francisco

Hybrid

USD 170.000 - 220.000

6 days ago
Be an early applicant