Job Search and Career Advice Platform

Enable job alerts via email!

Senior AI Engineer

Veridox

Remote

GBP 100,000 - 125,000

Full time

14 days ago

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

An AI-focused tech company in the UK is seeking a Senior AI Engineer driven by performance to lead the development of LLM and RAG pipelines. The role requires building efficient AI features and a strong statistical evaluation background. Candidates should demonstrate proficiency in Python and prior experience in production systems. This is a remote position aimed at professionals who are passionate about applying AI to impactful solutions within fraud detection.

Qualifications

  • Proven experience in LLM and RAG engineering in production is essential.
  • Strong understanding of statistical evaluation and evaluation metrics.
  • Excellent communicator with the ability to troubleshoot and resolve issues.

Responsibilities

  • Lead the development and optimisation of LLM and RAG pipelines.
  • Curate and own the Golden Dataset for model evaluation.
  • Automate evaluation processes and track key performance indicators.

Skills

Building LLM/RAG pipelines
Statistical evaluation
Communication skills
Understanding of unit economics
Experience with Python

Tools

AWS
OpenSearch
CI/CD pipelines
Job description
Role

Senior AI Engineer (LLMOps & RAG)

Location

Remote (UK-based preferred)

Type

Full-time

Compensation

Competitive

About Veridox

Veridox is an AI-driven fraud detection platform purpose-built for insurers. We combine document analysis with contextual intelligence to output detailed risk analysis. We have a high focus on trust, accuracy and explainability. As part of our growing team, you’ll play a key role in scaling the technical vision that powers our platform.

The Role

We’re looking for a hands‑on, delivery‑first engineer to lead the development and optimisation of our LLM and RAG pipelines. This isn’t a research role. You’ll be responsible for building, benchmarking, and deploying high‑performance, cost‑efficient AI features that work, and improve, in production.

We’re not looking for 100-page white papers. We’re looking for someone who can ship features, track performance, and find novel solutions to customers problems.

What You’ll Do
  • Build and optimise RAG pipelines using AWS Bedrock, OpenSearch, and vector stores
  • Own our “Golden Dataset”, curating the truth‑set we use to evaluate model output
  • Automate evaluation using tools like RAGAS, DeepEval, or custom “LLM‑as‑a‑judge” logic
  • Track drift, hallucination, and cost using observability tooling (Arize, Phoenix, etc.)
  • Design self‑improving systems where user interaction data flows back into future retrieval / ranking
  • Balance cost and performance by selecting the right model for the right task (Claude, SLMs, or whatever gets the job done)
  • Write clean and fast Python and ship infrastructure as code
Tech Stack

If your experience is a mix‑and‑match of a selection of the below platforms and technologies, we'd like to hear from you.

  • Languages: Python, TypeScript, HCL
  • Vector & Search: OpenSearch, AWS S3 Vectors
  • Observability & Evaluation: Arize, Phoenix, RAGAS, DeepEval
  • Infrastructure: AWS Step Functions, Azure Function Apps
  • DevOps: CI / CD pipelines (BitBucket)
What We’re Looking For
  • Proven experience building LLM / RAG pipelines in production
  • Confidence in statistical evaluation (sample sizes, regression testing)
  • Ability to define evaluation metrics and continuously improve model outputs
  • Strong understanding of unit economics in LLM systems (token cost, latency, accuracy trade‑offs)
  • Clear communicator who can flag blockers early and ship fast
Nice‑to‑Have
  • Experience with AWS S3 vector store or similar
  • Familiarity with AI‑driven fraud detection, legal tech, or investigative tools
  • Prior work with small language models (7B–8B) for cost‑effective inference
Why Join Us?

You will work on a system where evaluation is central to the product. You’ll have the autonomy to define standards for building, measuring, and improving complex AI systems.

If you care about rigour, impact, and building things that matter : we’d love to hear from you.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.