¡Activa las notificaciones laborales por email!

Senior MLOps Platform Architect (AWS | Kubernetes | Terraform)

theHRchapter

A distancia

EUR 70.000 - 90.000

Jornada completa

Hoy

Sé de los primeros/as/es en solicitar esta vacante

Genera un currículum adaptado en cuestión de minutos

Consigue la entrevista y gana más. Más información

Descripción de la vacante

A leading HR solutions provider is seeking a Senior MLOps / DevOps / SRE hybrid to build AI platform infrastructure. This role is remote and requires extensive experience with AWS, Kubernetes, and CI/CD pipelines. You will design and implement MLOps solutions from the ground up, ensuring the reliability and performance of production systems. The company offers competitive compensation, training, and a focus on a collaborative work environment, presenting a great opportunity for growth in the evolving AI landscape.

Servicios

Competitive fixed compensation

20+ days paid time off

Apple gear

Training & development budget

Formación

5+ years in a Senior DevOps, SRE, or MLOps Engineering role supporting production environments.
Experience designing, building, and maintaining Kubernetes clusters in production.
Expertise with Terraform to manage cloud infrastructure.

Responsabilidades

Design and build AWS-based AI/ML infrastructure using Terraform.
Architect and build production Kubernetes clusters.
Implement full observability and ensure uptime for ML production services.

Conocimientos

AWS-based AI / ML infrastructure

Kubernetes cluster management

Terraform

CI/CD pipeline creation

Python programming

Docker and Helm

Observability tools (Prometheus, Grafana)

ML workflow tools (Kubeflow, MLflow)

Deployment on GPU or specialized hardware

Educación

Bachelor's degree in a relevant field

Herramientas

GitLab

Jenkins

Your Strategic Partner for HR, Payroll & Headhunting Solutions

🚀 We are hiring a senior MLOps / DevOps / SRE hybrid who can build an entire AI platform infrastructure end-to-end. This is not a research role and not a standard ML Engineer role. If you haven’t designed production-grade MLOps infrastructure, haven’t built CI / CD for ML, or haven’t deployed ML workloads on Kubernetes at scale, this role is not a fit.

Location : Remote - Europe (PL / ES / PT / CZ / CY)

Key Responsibilities

MLOps Platform Architecture (from scratch)

Design and build AWS-based AI / ML infrastructure using Terraform (required).
Define standards for security, automation, cost efficiency, and governance.
Architect infrastructure for ML workloads, GPU / accelerators, scaling, and high availability.

Kubernetes & Model Deployment

Architect, build, and operate production Kubernetes clusters.
Containerize and productize ML models (Docker, Helm).
Deploy latency‑sensitive and high‑throughput models (ASR / TTS / NLU / Agentic AI).
Ensure GPU and accelerator nodes are properly integrated and optimized.

CI / CD for Machine Learning

Build automated training, validation, and deployment pipelines (GitLab / Jenkins).
Implement canary, blue‑green, and automated rollback strategies.
Integrate MLOps lifecycle tools (MLflow, Kubeflow, SageMaker Model Registry, etc.).

Observability & Reliability

Implement full observability (Prometheus + Grafana).
Own uptime, performance, and reliability for ML production services.
Establish monitoring for latency, drift, model health, and infrastructure health.

Collaboration & Technical Leadership

Work closely with ML engineers, researchers, and data scientists.
Translate experimental models into production‑ready deployments.
Define best practices for MLOps across the company.

Qualifications and Skills

We’re looking for a senior engineer with a strong DevOps / SRE background who has worked extensively with ML systems in production. The ideal candidate brings a combination of infrastructure, automation, and hands‑on MLOps experience.

5+ years in a Senior DevOps, SRE, or MLOps Engineering role supporting production environments.
Strong experience designing, building, and maintaining Kubernetes clusters in production.
Hands‑on expertise with Terraform (or similar IaC tools) to manage cloud infrastructure.
Solid programming skills in Python or Go for building automation, tooling, and ML workflows.
Proven experience creating and maintaining CI / CD pipelines (GitLab or Jenkins).
Practical experience deploying and supporting ML models in production (e.g., ASR, TTS, NLU, LLM / Agentic AI).
Familiarity with ML workflow orchestration tools such as Kubeflow, Apache Airflow, or similar.
Experience with experiment tracking and model registry tools (e.g., MLflow, SageMaker Model Registry).
Exposure to deploying models on GPU or specialized hardware (e.g., Inferentia, Trainium).
Solid understanding of cloud infrastructure on AWS, including networking, scaling, storage, and security best practices.
Experience with deployment tooling (Docker, Helm) and observability stacks (Prometheus, Grafana).

Ways to Know You’ll Succeed

You enjoy building platforms from the ground up and owning technical decisions.
You’re comfortable collaborating with ML engineers, researchers, and software teams to turn research into stable production systems.
You like solving performance, automation, and reliability challenges in distributed systems.
You bring a structured, pragmatic, and scalable approach to infrastructure design.
Energetic and proactive individual, with a natural drive to take initiative and move things forward.
Enjoys working closely with people - researchers, ML engineers, cloud architects, product teams.
Comfortable sharing ideas openly, challenging assumptions, and contributing to technical discussions.
Collaborative mindset: you like to build together, not work in isolation.
Strong ownership mentality - you enjoy taking responsibility for systems end‑to‑end.
Curious, hands‑on, and motivated by solving complex technical challenges.
Clear communicator who can translate technical work into practical recommendations.
Thrives in a fast‑paced environment where you can experiment, improve, and shape how things are done.

What we offer

Competitive fixed compensation based on experience and expertise.
Work on cutting‑edge AI systems used globall.
Dynamic, multi‑disciplinary teams engaged in digital transformation.
Remote‑first work model.
Long‑term B2B contract.
20+ days paid time off.
Apple gear.
Training & development budget.

Our Core values at TheHRchapter

Transparency : We believe in transparent and smooth recruitment processes. You will get feedback from us.
Candidate experience : Perfect blend between automated and humanized recruitment processes. Don’t hesitate to ask us for feedback, anytime.
Talented pool : We bring highly‑skilled motivated candidates to our clients. Our candidates match their company values and management style.
Diversity and inclusion : There is no place for discrimination and intolerance. We care about diversity awareness and respect for any differences.

Consigue la evaluación confidencial y gratuita de tu currículum.

o arrastra un archivo en formato PDF, DOC, DOCX, ODT o PAGES de hasta 5 MB.

Ciudades destacadas

Empresas destacadas

Vacantes populares