Enable job alerts via email!

Senior Machine Learning Engineer

Bonfy.AI

Mountain View (CA)

Hybrid

USD 90,000 - 150,000

Full time

4 days ago

Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An innovative firm is seeking a passionate engineer to enhance the safety and accountability of AI systems. This role involves designing tools to evaluate LLM behavior and developing metrics that ensure trust in AI applications. You will collaborate across teams to create a cohesive content safety experience while working on cutting-edge technology. With a flexible hybrid schedule and a mission-driven environment, this opportunity allows you to make a meaningful impact in the evolving landscape of AI. Join a team that values clarity and respect, and help shape the future of responsible AI.

Benefits

Generous equity

Flexible hybrid schedule

Health coverage

Vision coverage

Dental coverage

Qualifications

Hands-on experience with modern NLP systems in real-world contexts.
Comfortable moving from prototype to production in Python.

Responsibilities

Design and build tools to evaluate and improve LLM behavior.
Define and evolve trust metrics beyond accuracy.

Skills

NLP systems

Python programming

debugging skills

evaluation frameworks

model interpretability

Bonfy.AI | Mountain View, CA | Hybrid

Security for the Age of AI

About Us

At Bonfy.AI, we’re building the trust layer for generative AI. Our Adaptive Content Security platform detects and mitigates subtle risks baked into large language model (LLM) outputs—before they make it to the user. From hallucinations to hidden data leaks, we help enterprises use GenAI without compromising truth, privacy, or reputation.

We’re model-agnostic, outcome-focused, and unapologetically rigorous. Our customers include Fortune 500 teams deploying LLMs in high-stakes domains—where trust isn't optional.

Why This Role Matters

We’re looking for an engineer who wants to go deeper than metrics—someone who can analyze model behavior, identify subtle failure modes, and build real-time systems that make AI safer to use. You won’t be tuning models for leaderboard glory; you’ll be making them safer, traceable, and accountable. This is a chance to shape the foundation of how the world trusts AI.

What You’ll Do

Design and build tools that evaluate and improve LLM behavior across diverse use cases
Define and evolve trust metrics that go beyond accuracy — including traceability, robustness under edge cases, and interpretability of model decisions.
Work across teams—infra, product, security—to embed ML insights into a cohesive content safety experience.
Help us define and refine trust metrics beyond accuracy: traceability, brittleness, interpretability.

What We’re Looking For

Hands-on experience working with modern NLP systems in real-world contexts (LLMs, embeddings, transformers, etc.).
Comfort moving from prototype to production in Python—outside the notebook.
Experience building or working with evaluation frameworks and pipelines.
Practical thinking, sharp debugging skills, and an appetite for ambiguity.

Bonus Points For:

Experience using or building tools that evaluate the behavior of language models (LLMs).
Background in environments where trust, safety, or compliance is critical—even if outside traditional “regulated” industries.
Hands-on experience testing AI systems for edge cases, failure modes, or unexpected behavior.

Why Join Us

You’ll have technical autonomy and direct exposure to customer use cases.
We’re early-stage, well-funded, and mission-driven—your code will shape our trajectory.
We believe in clarity, urgency, and respect. We value what you ship, not how loud you are.
You’ll work with a sharp, kind, high-trust team that knows what’s at stake.

Compensation & Benefits

Competitive salary. Generous equity. Flexible hybrid schedule. Health, vision, and dental coverage. And most importantly: a chance to build something meaningful during the most critical phase of AI’s evolution.

Apply If...

You believe safety isn’t just an add-on—it’s essential to how AI is built.
You understand that trust in AI must be demonstrated through evidence, not assumed by design.
You’re willing to question conventional approaches when they fall short.
You want to contribute meaningfully to the evolution of responsible AI, not just follow established paths.

Bonfy.AI — Truth. Security. Intelligence.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs