Job Search and Career Advice Platform

Enable job alerts via email!

Senior AI Infrastructure & Platform Engineer - Riyadh,KSA

DeepSource

Saudi Arabia

On-site

SAR 299,000 - 450,000

Full time

Today
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A technology company in Saudi Arabia is seeking a Senior AI Infrastructure & Platform Engineer to build and manage scalable AI infrastructure. The role involves deploying and optimizing GPU clusters, managing orchestration tools, and developing CI/CD pipelines for AI/ML workloads. Ideal candidates will have experience with Nvidia orchestration tools and cluster scheduling. This position offers the opportunity to work closely with data scientists and ML engineers to define infrastructure requirements in a high-performance environment.

Qualifications

  • Experience deploying and managing GPU-based compute infrastructures.
  • Strong knowledge in orchestration tools like Nvidia AI Enterprise Suite.
  • Proficiency with monitoring and troubleshooting AI/ML workloads.

Responsibilities

  • Deploy, maintain, and optimize GPU-based compute clusters.
  • Manage GPU orchestration and scheduling tools.
  • Develop automation scripts and CI/CD pipelines.

Skills

GPU orchestration tools and platforms
Cluster scheduling tools
Automation scripting
CI/CD pipelines

Tools

Nvidia Base Command Manager
Slurm
Vanilla Kubernetes
Canonical Ubuntu
Job description
Role Overview

We are seeking a highly skilled Senior AI Infrastructure & Platform Engineer to join our client’s team in Riyadh. In this role, you’ll be responsible for building, managing, and optimizing scalable AI infrastructure and compute environments that support high-performance workloads, including GPU-accelerated AI/ML pipelines, cluster scheduling, and orchestration.


Key Responsibilities
  • Deploy, maintain, and optimize GPU-based compute clusters and infrastructure.
  • Manage and operate GPU orchestration tools and platforms such as:
    • Nvidia Base Command Manager (critical)
    • Nvidia AI Enterprise Suite
    • Nvidia GPU and Network Operators
    • Nvidia NIMs and Blueprints
  • Configure, deploy, and maintain compute workloads using scheduling and orchestration tools including:
    • Slurm (critical)
    • Vanilla Kubernetes
  • Install, configure, and maintain the underlying OS (e.g. Canonical Ubuntu) and supporting system software.
  • Monitor and troubleshoot infrastructure performance, availability, and reliability; ensure high uptime for AI/ML workloads.
  • Work with data scientists, ML engineers, and dev teams to define infrastructure requirements, resource allocation, and deployment workflows.
  • Develop automation scripts, CI/CD pipelines, and best practices for infrastructure provisioning and management.
  • Document architecture, configurations, and operational procedures; enforce security, compliance, and backup policies.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.