Enable job alerts via email!

Senior Machine Learning Engineer

SAGE GROUP PLC

Newcastle upon Tyne

On-site

GBP 70,000 - 90,000

Full time

Today

Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading tech company in the UK seeks a Senior ML Engineer to take technical ownership of scalable machine learning environments. This role will lead the transition of experimental models into production-grade services, ensuring they are observable and efficient. Responsibilities include designing deployment pipelines, establishing monitoring frameworks, and integrating ML models into products seamlessly. Ideal candidates have a solid engineering background, with hands-on experience in ML systems and proficiency in cloud platforms like AWS.

Qualifications

Strong experience delivering machine-learning systems in production.
Proven software engineering background with motivation to grow into ML and MLOps.
Experience with cloud environments, preferably AWS.

Responsibilities

Design and own automated training and deployment pipelines.
Lead in adopting software engineering best practices.
Establish monitoring frameworks for system health and model metrics.
Own strategy for AI cloud spending and optimization.
Integrate models into product ecosystems seamlessly.
Ensure documentation and audit trails for production deployments.

Skills

Production-quality Python

Version Control

CI/CD practices

API-first design

Cost Management

Observability

Tools

AWS SageMaker

Docker

Overview

Job Description We are looking for a Senior ML Engineer to take technical ownership of our machine learning production environment. You will lead the transition of experimental models into production-grade services that are reliable, scalable, and cost-effective. Your mission is to build the "highway" that allows our data science team to deploy models rapidly while ensuring those models are observable and fiscally responsible. You will own the entire ML lifecycle—from automated training pipelines to real-time inference clusters—and serve as a key software engineering contributor to our AI product stack.

Responsibilities

Lifecycle & Pipeline Architecture: Design and own the automated "Continuous Training" (CT) and deployment pipelines. Architect reusable, modular infrastructure for model training and serving, ensuring the entire lifecycle is versioned and reproducible.
Software Engineering Best Practices: Lead the team in adopting professional engineering standards. This includes owning the strategy for unit/integration testing, peer code reviews, and applying SOLID principles to ML codebases to ensure they remain modular and maintainable.
ML Observability: Establish and own the telemetry framework for the AI stack. Implement proactive monitoring for system health and model-specific metrics, such as data drift, concept drift, and prediction accuracy.
FinOps & Cost Management: Own the strategy for AI cloud spend. Build monitoring and alerting frameworks to track compute costs (training and inference) and implement optimization strategies like auto-scaling and spot-instance usage.
AI Systems Engineering: Act as a lead software engineer to integrate models into the product ecosystem. Develop high-performance, secure APIs and microservices that wrap our ML capabilities for production consumption.
Data & Model Governance: Own the versioning strategy for the "Holy Trinity" of ML: code, data, and model artifacts. Ensure clear documentation and audit trails for all production deployments.

Context

Demonstrating strong software engineering fundamentals, including production-quality Python, testing, CI/CD practices, and version control; designing and operating reliable, versioned REST APIs using an API-first approach; building, deploying, and operating backend services in cloud environments (AWS as primary platform; other clouds considered transferable); using containerisation and modern deployment approaches, including Docker, automated pipelines, and basic observability; working effectively with real-world data and production systems in collaboration with product, data, and platform teams; bringing hands-on experience delivering machine-learning systems in production or a strong software-engineering background with motivation to grow into ML and MLOps.

Desirable skills

Using AWS SageMaker for training, deploying, and operating machine-learning workloads, or equivalent experience on similar cloud ML platforms
Exposing machine-learning models via APIs (e.g. FastAPI-based inference services) and operating them reliably at scale
Applying MLOps practices, including model and version management, monitoring, and handling model or data drift
Implementing advanced service patterns such as asynchronous processing, event-driven architectures, or multi-version services
Serving LLM or GenAI-based capabilities in production, including model serving, RAG pipelines, and inference controls
Designing reusable, platform-level services and shared ML patterns rather than one-off implementations
Managing cloud operational trade-offs, including cost efficiency, latency, scalability, and reliability

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Top locations

Top companies

Top positions