Enable job alerts via email!
Boost your interview chances
Create a job specific, tailored resume for higher success rate.
An innovative technology firm is seeking a skilled ML Infrastructure Senior Engineer to join their dynamic AI/ML platform team. This role focuses on deploying and optimizing large-scale machine learning systems, leveraging advanced inference engines. You will collaborate closely with research and data science teams to enhance model capabilities and manage the MLOps lifecycle. If you're passionate about machine learning and eager to work in a fast-paced environment, this opportunity offers a chance to make a significant impact in the field of AI.
Job Title: ML Infrastructure Senior Engineer
Location: Abu Dhabi, United Arab Emirates [Full relocation package provided]
Job Overview
We are seeking a skilled ML Infrastructure Engineer to join our growing AI/ML platform team. This role is ideal for someone passionate about large-scale machine learning systems and has hands-on experience deploying LLMs/SLMs using advanced inference engines like vLLM. You will play a critical role in designing, deploying, optimizing, and managing ML models and the infrastructure around them—both for inference, fine-tuning and continued pre-training.
Key Responsibilities
· Deploy large-scale or small language models (LLMs/SLMs) using inference engines (e.g., vLLM, Triton, etc.).
· Collaborate with research and data science teams to fine-tune models or build automated fine-tuning pipelines.
· Extend inference-level capabilities by integrating advanced features such as multi-modality, real-time inferencing, model quantization, and tool-calling.
· Evaluate and recommend optimal hardware configurations (GPU, CPU, RAM) based on model size and workload patterns.
· Build, test, and optimize LLMs Inference for consistent model deployment.
· Implement and maintain infrastructure-as-code to manage scalable, secure, and elastic cloud-based ML environments.
· Ensure seamless orchestration of the MLOps lifecycle, including experiment tracking, model registry, deployment automation, and monitoring.
· Manage ML model lifecycle on AWS (preferred) or other cloud platforms.
· Understand LLM architecture fundamentals to design efficient scalability strategies for both inference and fine-tuning processes.
Required Skills
Core Skills:
· Proven experience deploying LLMs or SLMs using inference engines like vLLM, TGI, or similar.
· Experience in fine-tuning language models or creating automated pipelines for model training and evaluation.
· Deep understanding of LLM architecture fundamentals (e.g., attention mechanisms, transformer layers) and how they influence infrastructure scalability and optimization.
· Strong understanding of hardware-resource alignment for ML inference and training.
Technical Proficiency:
· Programming experience in Python and C/C++, especially for inference optimization.
· Solid understanding of the end-to-end MLOps lifecycle and related tools.
· Experience with containerization, image building, and deployment (e.g., Docker, Kubernetes optional).
· Hands-on experience with AWS services for ML workloads (SageMaker, EC2, EKS, etc.) or equivalent services in Azure/GCP.
· Ability to manage cloud infrastructure to ensure high availability, scalability, and cost efficiency.
Nice-to-Have
· Experience with ML orchestration platforms like MLflow, SageMaker Pipelines, Kubeflow, or similar.
· Familiarity with model quantization, pruning, or other performance optimization techniques.
· Exposure to distributed training frameworks like Unsloth, DeepSpeed, Accelerate, or FSDP.
Referrals increase your chances of interviewing at AI71 by 2x
London, England, United Kingdom 5 days ago
London, England, United Kingdom 1 month ago
London, England, United Kingdom 1 week ago
London, England, United Kingdom 4 weeks ago
London, England, United Kingdom 1 day ago
London, England, United Kingdom 8 months ago
London, England, United Kingdom 2 weeks ago
London, England, United Kingdom 5 days ago
London, England, United Kingdom 6 hours ago
London, England, United Kingdom 1 week ago
Salfords, England, United Kingdom 5 days ago
London, England, United Kingdom 3 days ago
London, England, United Kingdom 6 days ago
London, England, United Kingdom 1 week ago
London, England, United Kingdom 1 month ago
London, England, United Kingdom 2 days ago
London, England, United Kingdom 2 weeks ago
London, England, United Kingdom 1 week ago
London, England, United Kingdom 2 weeks ago
London, England, United Kingdom 1 week ago
London, England, United Kingdom 5 hours ago
London, England, United Kingdom 6 hours ago
London, England, United Kingdom 3 months ago
London, England, United Kingdom 6 hours ago
London, England, United Kingdom 3 days ago
London, England, United Kingdom 5 hours ago
London, England, United Kingdom 1 month ago
London, England, United Kingdom 2 months ago
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.