Enable job alerts via email!

MLOps Engineer

Flow Talent

United Arab Emirates

On-site

AED 120,000 - 160,000

Full time

2 days ago
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading technology company in Abu Dhabi is seeking an MLOps engineer to build and manage AI infrastructure. The role involves deploying models, automating CI/CD pipelines, and ensuring high availability of AI services. Candidates should have extensive experience in DevOps and MLOps, particularly with Kubernetes and monitoring tools.

Qualifications

  • 10+ years of overall experience in solution operations.
  • 3+ years in DevOps, MLOps, or AI/ML infrastructure roles.
  • Proven experience with Kubernetes or OpenShift in production.

Responsibilities

  • Build, deploy, monitor, and manage large-scale AI infrastructure.
  • Operate and manage Kubernetes or OpenShift clusters.
  • Automate CI/CD pipelines for model packaging and serving.

Skills

Kubernetes
OpenShift
CI/CD automation
Python
Bash
Linux system administration
AI infrastructure
Monitoring tools

Tools

MLFlow
Prometheus
Grafana

Job description

Job description

Kinetic has partnered with a leading technology company hiring an MLOps engineer based in Abu Dhabi.

Please ensure you meet all the criteria below for your application to be considered. Suitable candidates will be contacted within 5 working days. If you are not contacted within that time, please consider your application unsuccessful.

Main responsibilities
  • Build, deploy, monitor, and manage large-scale AI infrastructure based on HGX H200 nodes.
  • Operate and manage Kubernetes or OpenShift clusters for multi-node orchestration.
  • Deploy and manage LLMs and other AI models for inference using Triton Inference Server or custom endpoints.
  • Automate CI/CD pipelines for model packaging, serving, retraining, and rollback using GitLab CI or ArgoCD.
  • Set up model and infrastructure monitoring systems (Prometheus, Grafana, NVIDIA DCGM).
  • Implement model drift detection, performance alerting, and inference logging.
  • Manage model checkpoints, reproducibility controls, and rollback strategies.
  • Track deployed model versions using MLFlow or similar registry tools.
  • Implement secure access controls for model endpoints and data artifacts.
  • Collaborate with AI/Data engineers to integrate and deploy fine-tuned datasets.
  • Ensure high availability, performance, and observability of all AI services in production.
Requirements
  • 10+ years of overall experience in solution operations.
  • Minimum 3+ years of experience in DevOps, MLOps, or AI/ML infrastructure roles.
  • Proven experience with Kubernetes or OpenShift in production environments, preferably certified.
  • Experience with CI/CD automation tools with OpenShift/Kubernetes.
  • Hands-on experience with model registry systems (e.g., MLFlow, KubeFlow).
  • Experience with monitoring tools (e.g., Prometheus, Grafana) and GPU workload optimization.
  • Strong scripting skills (Python, Bash) and Linux system administration knowledge.
  • Familiarity with deploying and scaling PyTorch or TensorFlow models for inference.
  • Applicants should be available for face-to-face interviews in Abu Dhabi.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.