Enable job alerts via email!

AI / MLOps Engineer (with DevOps)

Business Umbrella

Abu Dhabi

On-site

AED 120,000 - 150,000

Full time

30+ days ago

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Start fresh or import an existing resume

Job summary

A leading company in the UAE is seeking a skilled DevOps Engineer specializing in MLOps. The successful candidate will be responsible for managing Kubernetes clusters, deploying AI models, and automating CI/CD pipelines. This role involves ensuring the high availability and performance of AI services in production, collaborating with data engineers, and implementing secure access controls. Candidates should have strong scripting skills and experience with monitoring tools.

Qualifications

  • 3 years experience in DevOps MLOps or AI/ML infrastructure roles.
  • Proven experience with Kubernetes or OpenShift in production environments.

Responsibilities

  • Operate and manage Kubernetes or OpenShift clusters for multinode orchestration.
  • Automate CI/CD pipelines for model packaging and serving.

Skills

Kubernetes
OpenShift
Python
Bash
Linux System Administration

Tools

MLFlow
KubeFlow
Prometheus
Grafana
GitLab CI
ArgoCD

Job description

Operate and manage Kubernetes or OpenShift clusters for multinode orchestration

Deploy and manage LLMs and other AI models for inference using Triton Inference Server or custom endpoints

Automate CI/CD pipelines for model packaging serving retraining and rollback using GitLab CI or ArgoCD

Set up model and infrastructure monitoring systems (Prometheus Grafana NVIDIA DCGM)

Implement model drift detection performance alerting and inference logging

Manage model checkpoints reproducibility controls and rollback strategies

Track deployed model versions using MLFlow or equivalent registry tools

Implement secure access controls for model endpoints and data artifacts

Collaborate with AI / Data Engineer to integrate and deploy finetuned datasets

Ensure high availability performance and observability of all AI services in production


Requirements

3 years experience in DevOps MLOps or AI/ML infrastructure roles

10 overall experience with solution operations

Proven experience with Kubernetes or OpenShift in production environments preferably certified.

Familiarity with deploying and scaling PyTorch or TensorFlow models for inference

Experience with CI/CD automation tools with Open Shift / Kubernetes

Handson experience with model registry systems (e.g. MLFlow KubeFlow)

Experience with monitoring tools (e.g. Prometheus Grafana) and GPU workload optimization

Strong scripting skills (Python Bash) and Linux system administration knowledge


Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.