Enable job alerts via email!

AI Engineer

numi

Birmingham

On-site

GBP 60,000 - 80,000

Full time

Today

Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading HealthTech company in Birmingham seeks an experienced backend engineer to drive the architecture of their AI platform. The role involves managing real-time communications, designing low-latency systems, and optimizing AI agents with a focus on Python and voice technologies. Ideal candidates will have significant experience in backend development and machine learning, along with a startup mindset for rapid iteration. Join a pioneering team that’s reshaping clinician workflows in healthcare.

Qualifications

5+ years of backend or ML engineering experience focusing on LLM application layers.
Expert-level proficiency in Python with FastAPI and asyncio.
Proven track record building low-latency voice or text systems.

Responsibilities

Define the architecture and SLAs for the core reasoning engine.
Implement streaming voice pipelines focusing on VAD and interruptions.
Enhance retrieval signals using hybrid search and query rewriting.

Skills

Backend engineering

Machine learning

Python

Real-time systems

Agentic patterns

Startup environment adaptability

Tools

FastAPI

Asyncio

WebRTC

Docker

Kubernetes

Numi are partnered with an early-stage HealthTech company building the next generation of conversational AI for the medical field. Their platform allows clinicians to create controllable, customized AI assistants that automate workflows and support clinical decision-making.

While many tools exist for general transcription, we are solving the "last mile" problem in healthcare : enabling doctors and nurses to build their own deterministic, safe agents using natural language.

The Engineering Challenge

We are building a mission-critical platform where safety, latency, and correctness must coexist in a chaotic real-world environment. Our core technical hurdles include :

Ultra-Low Latency Voice : delivering sub-2-second response times across full duplex audio while handling "barge-ins" (interruptions) and natural turn-taking in noisy environments.
User-Defined Agent Logic : Creating a "no-code" engine where users can verbally define workflows and guardrails that the system executes deterministically.
Hierarchical & Auditable RAG : Routing queries through complex layers of patient history, clinical guidelines, and organizational policies with full traceability.
Resilient Orchestration : Managing long-running conversation states, tool usage, and concurrency control to ensure reliability even if external EHR or telephony systems falter.
Safety & Compliance : Enforcing strict scope-of-practice guardrails, PHI redaction, and rapid fallback mechanisms when model confidence drops.
Continuous Improvement : A feedback loop that utilizes automated evaluators and shadow testing to safely evolve the system in production.

The Role

You will own the architecture and evolution of the "Central Brain" the core service that powers our AI agents. You will design multi-agent systems that reason, retrieve data, and communicate via voice and text in real time.

Key Responsibilities

End-to-End Ownership : Define the architecture, SLAs, and error handling for the core reasoning engine.
Real-Time Comms Engineering : Implement streaming voice pipelines, focusing on VAD (Voice Activity Detection), interruption handling, and SIP / WebRTC integrations.
Advanced Agent Orchestration : Build planner-executor patterns and manage shared memory across agents.
Prompt Engineering & Optimization : utilize programmatic approaches to compile and iteratively improve prompts based on evaluation metrics.
RAG Optimization : Enhance retrieval signal through hybrid search, re-ranking, and query rewriting, ensuring high context precision and recall.
Observability & Evals : Build robust tracing (OpenTelemetry) and automated CI / CD evaluation gates (faithfulness, hallucination detection) to prevent regression.

Qualifications

Must Haves

5+ years of backend or ML engineering experience, with a recent deep focus on LLM application layers.
Expert-level Python : Deep familiarity with FastAPI, asyncio, and pydantic.
Real-Time Experience : Proven track record building low-latency voice or text systems (experience with WebRTC, sockets, or similar streaming technologies).
Agentic Patterns : Hands-on experience with ReAct, Chain-of-Thought (CoT), or other reasoning frameworks.
Startup DNA : Ability to ship fast and manage technical debt in a rapidly evolving environment.

Nice to Haves

Experience with DSPy or other programmatic prompt optimizers.
Familiarity with LLM-as-a-judge evaluation setups.
Knowledge of VoIP standards (SIP, SRTP) or modern voice infrastructure (e.g., LiveKit).
Experience with GCP (Cloud Run, GKE) and Healthcare data standards.

Tech Stack

Core : Python, FastAPI, Pydantic, Asyncio.
Data : Postgres, Redis, Vector Stores.
Voice / Infra : WebRTC, SIP Gateways, Docker, Kubernetes, Terraform.
AI / Ops : OpenTelemetry, Custom Eval Frameworks.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.