Enable job alerts via email!

MLOps Engineer

Flow Talent

United Arab Emirates

On-site

AED 120,000 - 160,000

Full time

2 days ago

Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading technology company in Abu Dhabi is seeking an MLOps engineer to build and manage AI infrastructure. The role involves deploying models, automating CI/CD pipelines, and ensuring high availability of AI services. Candidates should have extensive experience in DevOps and MLOps, particularly with Kubernetes and monitoring tools.

Qualifications

10+ years of overall experience in solution operations.
3+ years in DevOps, MLOps, or AI/ML infrastructure roles.
Proven experience with Kubernetes or OpenShift in production.

Responsibilities

Build, deploy, monitor, and manage large-scale AI infrastructure.
Operate and manage Kubernetes or OpenShift clusters.
Automate CI/CD pipelines for model packaging and serving.

Skills

Kubernetes

OpenShift

CI/CD automation

Python

Bash

Linux system administration

AI infrastructure

Monitoring tools

Tools

MLFlow

Prometheus

Grafana

Job description

Kinetic has partnered with a leading technology company hiring an MLOps engineer based in Abu Dhabi.

Please ensure you meet all the criteria below for your application to be considered. Suitable candidates will be contacted within 5 working days. If you are not contacted within that time, please consider your application unsuccessful.

Main responsibilities

Build, deploy, monitor, and manage large-scale AI infrastructure based on HGX H200 nodes.
Operate and manage Kubernetes or OpenShift clusters for multi-node orchestration.
Deploy and manage LLMs and other AI models for inference using Triton Inference Server or custom endpoints.
Automate CI/CD pipelines for model packaging, serving, retraining, and rollback using GitLab CI or ArgoCD.
Set up model and infrastructure monitoring systems (Prometheus, Grafana, NVIDIA DCGM).
Implement model drift detection, performance alerting, and inference logging.
Manage model checkpoints, reproducibility controls, and rollback strategies.
Track deployed model versions using MLFlow or similar registry tools.
Implement secure access controls for model endpoints and data artifacts.
Collaborate with AI/Data engineers to integrate and deploy fine-tuned datasets.
Ensure high availability, performance, and observability of all AI services in production.

Requirements

10+ years of overall experience in solution operations.
Minimum 3+ years of experience in DevOps, MLOps, or AI/ML infrastructure roles.
Proven experience with Kubernetes or OpenShift in production environments, preferably certified.
Experience with CI/CD automation tools with OpenShift/Kubernetes.
Hands-on experience with model registry systems (e.g., MLFlow, KubeFlow).
Experience with monitoring tools (e.g., Prometheus, Grafana) and GPU workload optimization.
Strong scripting skills (Python, Bash) and Linux system administration knowledge.
Familiarity with deploying and scaling PyTorch or TensorFlow models for inference.
Applicants should be available for face-to-face interviews in Abu Dhabi.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.