Enable job alerts via email!

Senior MLOps Engineer

Open Data Science

Dubai

On-site

AED 200,000 - 300,000

Full time

9 days ago

Job summary

A progressive AI startup in Dubai is seeking a Senior MLops Engineer to design and operate ML infrastructure for large-scale GPU models. The ideal candidate will have hands-on experience in model serving, Kubernetes, and Python, contributing to high-performance deployment pipelines and scalable systems in a dynamic environment.

Benefits

Full relocation package

Qualifications

2–3 years of experience with model serving frameworks like Triton or Ray Serve.
3–4 years of experience with Kubernetes and infrastructure-as-code tools.
4–5 years of software engineering experience in Python.

Responsibilities

Architect and maintain scalable ML infrastructure on AWS EKS.
Own end-to-end model deployment pipelines for various AI models.
Design cost-effective auto-scaling serving systems.

Skills

Model serving frameworks expertise

Kubernetes experience

Python programming

Tools

Terraform

Helm

Brief description of the vacancy

We are seeking a Senior MLops Engineer with proven experience in deploying and managing large-scale ML infrastructure for LLMs, TTS, STT, Stable Diffusion, and other GPU-intensive models in production. You will lead the design and operation of cost-efficient, high-availability, and high-performance serving stacks in a Kubernetes-based AWS environment.

About the company

Company Identity AI Labs

A fast-growing and well-funded AI startup in the UAE. Mission of the company is to redefine how humans interact with AI through emotionally intelligent, relationship-focused technology

Responsibilities

You will architect, deploy, and maintain scalable ML infrastructure on AWS EKS using Terraform and Helm.
You will own end-to-end model deployment pipelines for LLMs, diffusion models (LDM / Stable Diffusion), and other generative / AI models requiring high GPU throughput.
You will design cost-effective, auto-scaling serving systems using tools like Triton Inference Server, vLLM, Ray Serve, or similar model-serving frameworks.
You will build and maintain CI / CD pipelines integrating the ML model lifecycle (training → validation → packaging → deployment).
You will optimize GPU resource utilization and implement job orchestration with frameworks like KServe, Kubeflow, or custom workloads on EKS.
You will deploy and manage FluxCD (or ArgoCD) for GitOps-based deployment and environment promotion.
You will implement robust monitoring, logging, and alerting for model health and infrastructure performance (Prometheus, Grafana, Loki).
You will collaborate closely with ML Engineers and Software Engineers to ensure smooth integration, observability, and feedback loops.

Requirements

2–3 years of experience with model serving frameworks such as Triton, vLLM, Ray Serve, TorchServe, or similar.
2–3 years of experience deploying and optimizing LLMs and LDMs (e.g., Stable Diffusion) under high load with GPU-aware scaling.
3–4 years of experience with Kubernetes (EKS) and infrastructure-as-code (Terraform, Helm).
4–5 years of hands-on software engineering experience in Python, with production-grade experience in ML model lifecycle.
Nice to have: familiarity with Go or Rust for backend or performance-critical systems.

Working conditions

Full time job in Dubai office, official employment and full relocation package

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Top cities

Top companies

Popular jobs

Senior MLOps Engineer

Open Data Science

Dubai

On-site

AED 200,000 - 300,000