Job Search and Career Advice Platform

¡Activa las notificaciones laborales por email!

Senior MLOps Platform Architect (AWS | Kubernetes | Terraform)

theHRchapter

A distancia

EUR 70.000 - 90.000

Jornada completa

Hoy
Sé de los primeros/as/es en solicitar esta vacante

Genera un currículum adaptado en cuestión de minutos

Consigue la entrevista y gana más. Más información

Descripción de la vacante

A leading HR solutions provider is seeking a Senior MLOps / DevOps / SRE hybrid to build AI platform infrastructure. This role is remote and requires extensive experience with AWS, Kubernetes, and CI/CD pipelines. You will design and implement MLOps solutions from the ground up, ensuring the reliability and performance of production systems. The company offers competitive compensation, training, and a focus on a collaborative work environment, presenting a great opportunity for growth in the evolving AI landscape.

Servicios

Competitive fixed compensation
20+ days paid time off
Apple gear
Training & development budget

Formación

  • 5+ years in a Senior DevOps, SRE, or MLOps Engineering role supporting production environments.
  • Experience designing, building, and maintaining Kubernetes clusters in production.
  • Expertise with Terraform to manage cloud infrastructure.

Responsabilidades

  • Design and build AWS-based AI/ML infrastructure using Terraform.
  • Architect and build production Kubernetes clusters.
  • Implement full observability and ensure uptime for ML production services.

Conocimientos

AWS-based AI / ML infrastructure
Kubernetes cluster management
Terraform
CI/CD pipeline creation
Python programming
Docker and Helm
Observability tools (Prometheus, Grafana)
ML workflow tools (Kubeflow, MLflow)
Deployment on GPU or specialized hardware

Educación

Bachelor's degree in a relevant field

Herramientas

GitLab
Jenkins
Descripción del empleo

Your Strategic Partner for HR, Payroll & Headhunting Solutions

🚀 We are hiring a senior MLOps / DevOps / SRE hybrid who can build an entire AI platform infrastructure end-to-end. This is not a research role and not a standard ML Engineer role. If you haven’t designed production-grade MLOps infrastructure, haven’t built CI / CD for ML, or haven’t deployed ML workloads on Kubernetes at scale, this role is not a fit.

Location : Remote - Europe (PL / ES / PT / CZ / CY)

Key Responsibilities

MLOps Platform Architecture (from scratch)

  • Design and build AWS-based AI / ML infrastructure using Terraform (required).
  • Define standards for security, automation, cost efficiency, and governance.
  • Architect infrastructure for ML workloads, GPU / accelerators, scaling, and high availability.

Kubernetes & Model Deployment

  • Architect, build, and operate production Kubernetes clusters.
  • Containerize and productize ML models (Docker, Helm).
  • Deploy latency‑sensitive and high‑throughput models (ASR / TTS / NLU / Agentic AI).
  • Ensure GPU and accelerator nodes are properly integrated and optimized.

CI / CD for Machine Learning

  • Build automated training, validation, and deployment pipelines (GitLab / Jenkins).
  • Implement canary, blue‑green, and automated rollback strategies.
  • Integrate MLOps lifecycle tools (MLflow, Kubeflow, SageMaker Model Registry, etc.).

Observability & Reliability

  • Implement full observability (Prometheus + Grafana).
  • Own uptime, performance, and reliability for ML production services.
  • Establish monitoring for latency, drift, model health, and infrastructure health.

Collaboration & Technical Leadership

  • Work closely with ML engineers, researchers, and data scientists.
  • Translate experimental models into production‑ready deployments.
  • Define best practices for MLOps across the company.
Qualifications and Skills

We’re looking for a senior engineer with a strong DevOps / SRE background who has worked extensively with ML systems in production. The ideal candidate brings a combination of infrastructure, automation, and hands‑on MLOps experience.

  • 5+ years in a Senior DevOps, SRE, or MLOps Engineering role supporting production environments.
  • Strong experience designing, building, and maintaining Kubernetes clusters in production.
  • Hands‑on expertise with Terraform (or similar IaC tools) to manage cloud infrastructure.
  • Solid programming skills in Python or Go for building automation, tooling, and ML workflows.
  • Proven experience creating and maintaining CI / CD pipelines (GitLab or Jenkins).
  • Practical experience deploying and supporting ML models in production (e.g., ASR, TTS, NLU, LLM / Agentic AI).
  • Familiarity with ML workflow orchestration tools such as Kubeflow, Apache Airflow, or similar.
  • Experience with experiment tracking and model registry tools (e.g., MLflow, SageMaker Model Registry).
  • Exposure to deploying models on GPU or specialized hardware (e.g., Inferentia, Trainium).
  • Solid understanding of cloud infrastructure on AWS, including networking, scaling, storage, and security best practices.
  • Experience with deployment tooling (Docker, Helm) and observability stacks (Prometheus, Grafana).
Ways to Know You’ll Succeed
  • You enjoy building platforms from the ground up and owning technical decisions.
  • You’re comfortable collaborating with ML engineers, researchers, and software teams to turn research into stable production systems.
  • You like solving performance, automation, and reliability challenges in distributed systems.
  • You bring a structured, pragmatic, and scalable approach to infrastructure design.
  • Energetic and proactive individual, with a natural drive to take initiative and move things forward.
  • Enjoys working closely with people - researchers, ML engineers, cloud architects, product teams.
  • Comfortable sharing ideas openly, challenging assumptions, and contributing to technical discussions.
  • Collaborative mindset: you like to build together, not work in isolation.
  • Strong ownership mentality - you enjoy taking responsibility for systems end‑to‑end.
  • Curious, hands‑on, and motivated by solving complex technical challenges.
  • Clear communicator who can translate technical work into practical recommendations.
  • Thrives in a fast‑paced environment where you can experiment, improve, and shape how things are done.
What we offer
  • Competitive fixed compensation based on experience and expertise.
  • Work on cutting‑edge AI systems used globall.
  • Dynamic, multi‑disciplinary teams engaged in digital transformation.
  • Remote‑first work model.
  • Long‑term B2B contract.
  • 20+ days paid time off.
  • Apple gear.
  • Training & development budget.
Our Core values at TheHRchapter
  • Transparency : We believe in transparent and smooth recruitment processes. You will get feedback from us.
  • Candidate experience : Perfect blend between automated and humanized recruitment processes. Don’t hesitate to ask us for feedback, anytime.
  • Talented pool : We bring highly‑skilled motivated candidates to our clients. Our candidates match their company values and management style.
  • Diversity and inclusion : There is no place for discrimination and intolerance. We care about diversity awareness and respect for any differences.
Consigue la evaluación confidencial y gratuita de tu currículum.
o arrastra un archivo en formato PDF, DOC, DOCX, ODT o PAGES de hasta 5 MB.