Job Search and Career Advice Platform

¡Activa las notificaciones laborales por email!

Senior MLOps Platform Architect (AWS | Kubernetes | Terraform)

Salve.Inno Consulting

A distancia

EUR 70.000 - 110.000

Jornada completa

Hace 30+ días

Genera un currículum adaptado en cuestión de minutos

Consigue la entrevista y gana más. Más información

Descripción de la vacante

A leading AI consulting firm is seeking a Senior MLOps/DevOps engineer to build and manage AWS-based infrastructure for AI models. This remote role requires expertise in Kubernetes and Terraform, as well as experience in CI/CD pipeline development. The ideal candidate will have a strong background in deploying ML models and collaborating with ML researchers and engineers. Competitive compensation and a remote-first work model are offered.

Servicios

Competitive fixed compensation
20 days paid time off
Training & development budget
Apple gear

Formación

  • 5 years in a Senior DevOps SRE or MLOps Engineering role supporting production environments.
  • Strong experience designing and maintaining Kubernetes clusters.
  • Hands-on expertise with Terraform and cloud infrastructure management.

Responsabilidades

  • Design and build AWS-based AI/ML infrastructure with Terraform.
  • Architect and operate production Kubernetes clusters.
  • Build automated training and deployment pipelines.

Conocimientos

Kubernetes management
Terraform
Python programming
CI/CD pipeline creation
Cloud infrastructure management

Herramientas

Docker
GitLab
Jenkins
Descripción del empleo

Remote B2B Contract Europe (PL / ES / PT / CZ / CY)

Role Overview

We are hiring a senior MLOps / DevOps / SRE hybrid who can build an entire AI platform infrastructure end-to-end. This is not a research role and not a standard ML Engineer role. If you havent designed production-grade MLOps infrastructure havent built CI / CD for ML or havent deployed ML workloads on Kubernetes at scale this role is not a fit.

You will design build and own the AWS-based infrastructure Kubernetes platform CI / CD pipelines and observability stack that supports our AI models (Agentic AI NLU ASR Voice Biometrics TTS). You will be the technical owner of MLOps infrastructure decisions patterns and standards.

Key Responsibilities :
MLOps Platform Architecture (from scratch)
  • Design and build AWS-based AI / ML infrastructure using Terraform (required).
  • Define standards for security automation cost efficiency and governance.
  • Architect infrastructure for ML workloads GPU / accelerators scaling and high availability.
Kubernetes & Model Deployment
  • Architect build and operate production Kubernetes clusters.
  • Containerize and productize ML models (Docker Helm).
  • Deploy latency-sensitive and high-throughput models (ASR / TTS / NLU / Agentic AI).
  • Ensure GPU and accelerator nodes are properly integrated and optimized.
CI / CD for Machine Learning
  • Build automated training validation and deployment pipelines (GitLab / Jenkins).
  • Implement canary blue-green and automated rollback strategies.
  • Integrate MLOps lifecycle tools (MLflow Kubeflow SageMaker Model Registry etc.).
Observability & Reliability
  • Implement full observability (Prometheus Grafana).
  • Own uptime performance and reliability for ML production services.
  • Establish monitoring for latency drift model health and infrastructure health.
Collaboration & Technical Leadership
  • Work closely with ML engineers researchers and data scientists.
  • Translate experimental models into production-ready deployments.
  • Define best practices for MLOps across the company.
Requirements :
  • 5 years in a Senior DevOps SRE or MLOps Engineering role supporting production environments.
  • Strong experience designing building and maintaining Kubernetes clusters in production.
  • Hands-on expertise with Terraform (or similar IaC tools) to manage cloud infrastructure.
  • Solid programming skills in Python or Go for building automation tooling and ML workflows.
  • Proven experience creating and maintaining CI / CD pipelines (GitLab or Jenkins).
  • Practical experience deploying and supporting ML models in production (e.g. ASR TTS NLU LLM / Agentic AI).
  • Familiarity with ML workflow orchestration tools such as Kubeflow Apache Airflow or similar.
  • Experience with experiment tracking and model registry tools (e.g. MLflow SageMaker Model Registry ).
  • Exposure to deploying models on GPU or specialized hardware (e.g. Inferentia Trainium ).
  • Solid understanding of cloud infrastructure on AWS including networking scaling storage and security best practices.
  • Experience with deployment tooling (Docker Helm) and observability stacks (Prometheus Grafana).
Ways to Know Youll Succeed
  • You enjoy building platforms from the ground up and owning technical decisions.
  • Youre comfortable collaborating with ML engineers researchers and software teams to turn research into stable production systems.
  • You like solving performance automation and reliability challenges in distributed systems.
  • You bring a structured pragmatic and scalable approach to infrastructure design.
  • Energetic and proactive individual with a natural drive to take initiative and move things forward.
  • Enjoys working closely with people - researchers ML engineers cloud architects product teams.
  • Comfortable sharing ideas openly challenging assumptions and contributing to technical discussions.
  • Collaborative mindset : you like to build together not work in isolation.
  • Strong ownership mentality - you enjoy taking responsibility for systems end-to-end.
  • Curious hands-on and motivated by solving complex technical challenges.
  • Clear communicator who can translate technical work into practical recommendations.
  • Thrives in a fast-paced environment where you can experiment improve and shape how things are done.
Whats on Offer :
  • Competitive fixed compensation based on experience and expertise.
  • Work on cutting-edge AI systems used globall.
  • Dynamic multi-disciplinary teams engaged in digital transformation.
  • Remote-first work model
  • Long-term B2B contract
  • 20 days paid time off
  • Apple gear
  • Training & development budget
Diversity and Inclusion Commitment

We are dedicated to creating and sustaining an inclusive respectful workplace for all -regardless of gender ethnicity or background. We actively encourage applicants from all identities and experience levels to apply and bring your authentic self to our fast-paced supportive team.

Key Skills

Apache Hive,S3,Redshift,Spark,AWS,Solr,NoSQL,Data Warehouse,Internet Of Things,Kafka,DynamoDB,ZooKeeper

Employment Type: Full Time

Experience: years

Vacancy: 1

Consigue la evaluación confidencial y gratuita de tu currículum.
o arrastra un archivo en formato PDF, DOC, DOCX, ODT o PAGES de hasta 5 MB.