Enable job alerts via email!
Boost your interview chances
Create a job specific, tailored resume for higher success rate.
A leading company is seeking a Senior MLOps Engineer to lead the development and management of infrastructure for training and deploying ML models. The role involves collaboration with cross-functional teams to integrate machine learning models into scalable production pipelines. Candidates should have a strong background in MLOps and cloud services, particularly AWS.
Job Description
Role: Senior MLOps Engineer
Location: Abu Dhabi, UAE (Full Relocation Provided)
Company: AI71
About Us
AI71 is an applied research team committed to building responsible and impactful AI
agents that empower knowledge workers. In partnership with the Technology Innovation
Institute (TII), we drive innovation through cutting-edge AI research and development. Our
mission is to translate breakthroughs in machine learning into transformative products
that reshape industries.
Senior MLOps Engineer
AI71 is seeking a Senior MLOps Engineer to lead the development and management of our
infrastructure, designed for training, deploying, and maintaining ML models. This role plays
a critical function in operationalizing state-of-the-art systems to ensure high-performance
delivery across research and production environments.
The successful candidate will be responsible for designing and implementing
infrastructure to support efficient model deployment, inference, monitoring, and
retraining. This includes close collaboration with cross-functional teams to integrate
machine learning models into scalable and secure production pipelines, enabling the
delivery of real-time, data-driven solutions across various domains.
Key Responsibilities
• Model Deployment: Lead the deployment and scaling of LLMs and other deep
learning models using inference engines such as vLLM, Triton, or TGI, ensuring
optimal performance and reliability.
• Pipeline Engineering: Design and maintain automated pipelines for model finetuning, evaluation, versioning, and continuous delivery using tools like MLflow,
SageMaker Pipelines, or Kubeflow.
• Infrastructure Management: Architect and manage cloud-, cost-effective
infrastructure for machine learning workloads using AWS (SageMaker, EC2, EKS,
Lambda) or equivalent platforms.
• Performance Optimization: Implement monitoring, logging, and optimization
strategies to meet latency, throughput, and availability requirements across ML
services.
• Collaboration: Work closely with ML researchers, data scientists, and engineers to
support experimentation workflows, streamline deployment, and translate research
prototypes into production-ready solutions.
• Automation & DevOps: Develop infrastructure-as-code (IaC) solutions to support
repeatable, secure deployments and continuous integration/continuous delivery
(CI/CD) for ML systems.
• Model Efficiency: Apply model optimization techniques such as quantization,
pruning, and multi-GPU/distributed inference to enhance system performance and
cost-efficiency.
Qualifications
• Professional Experience: Minimum 5 years of experience in MLOps, ML
infrastructure, or machine learning engineering, with a strong record of managing
end-to-end ML model lifecycles.
• Deployment Expertise: Proven experience in deploying large-scale models in
production environments with advanced inference techniques.
• Cloud Proficiency: In-depth expertise in cloud services (preferably AWS), including
infrastructure management, scaling, and cost optimization for ML workloads.
• Programming Skills: Strong programming proficiency in Python, with additional
experience in C/C++ for performance-sensitive applications.
• Tooling Knowledge: Proficiency in MLOps frameworks such as MLflow, Kubeflow,
or SageMaker Pipelines; familiarity with Docker and Kubernetes.
• Optimization Techniques: Hands-on experience with model performance
optimization techniques and distributed training frameworks (e.g., DeepSpeed,
FSDP, Accelerate).
• Educational Background: Bachelor’s or Master’s degree in Computer Science,
Machine Learning, Data Engineering, or a related technical field.
Why Join AI71?
• Advanced Technology Stack: Work with some of the most capable large
models and cutting-edge ML infrastructure.
• High-Impact Work: Contribute directly to the deployment of AI solutions that
deliver measurable business value across industries.
• Collaboration-Driven Environment: Engage with a high-performing,
interdisciplinary team focused on continuous innovation.
• Robust Infrastructure: Access high-performance compute resources to support
experimentation and scalable deployment.
• Relocation Package: Full support for relocation to Abu Dhabi, with a competitive
compensation package and lifestyle benefits