Enable job alerts via email!
Generate a tailored resume in minutes
Land an interview and earn more. Learn more
A leading company in the UAE is seeking a skilled DevOps Engineer specializing in MLOps. The successful candidate will be responsible for managing Kubernetes clusters, deploying AI models, and automating CI/CD pipelines. This role involves ensuring the high availability and performance of AI services in production, collaborating with data engineers, and implementing secure access controls. Candidates should have strong scripting skills and experience with monitoring tools.
Operate and manage Kubernetes or OpenShift clusters for multinode orchestration
Deploy and manage LLMs and other AI models for inference using Triton Inference Server or custom endpoints
Automate CI/CD pipelines for model packaging serving retraining and rollback using GitLab CI or ArgoCD
Set up model and infrastructure monitoring systems (Prometheus Grafana NVIDIA DCGM)
Implement model drift detection performance alerting and inference logging
Manage model checkpoints reproducibility controls and rollback strategies
Track deployed model versions using MLFlow or equivalent registry tools
Implement secure access controls for model endpoints and data artifacts
Collaborate with AI / Data Engineer to integrate and deploy finetuned datasets
Ensure high availability performance and observability of all AI services in production
3 years experience in DevOps MLOps or AI/ML infrastructure roles
10 overall experience with solution operations
Proven experience with Kubernetes or OpenShift in production environments preferably certified.
Familiarity with deploying and scaling PyTorch or TensorFlow models for inference
Experience with CI/CD automation tools with Open Shift / Kubernetes
Handson experience with model registry systems (e.g. MLFlow KubeFlow)
Experience with monitoring tools (e.g. Prometheus Grafana) and GPU workload optimization
Strong scripting skills (Python Bash) and Linux system administration knowledge