Enable job alerts via email!

DevOps Engineer – AI Infrastructure & Kubernetes - GRaduate Industry Traineeships (GRIT) Programme

RAYDIAN CLOUD PTE. LTD.

Singapore

On-site

SGD 75,000 - 120,000

Full time

Today
Be an early applicant

Job summary

A cloud services company in Singapore is seeking a forward-thinking DevOps Engineer to enhance AI workloads. The role involves optimizing cloud infrastructure and deploying Kubernetes clusters for AI projects. Candidates should have strong skills in Infrastructure as Code and cloud platforms. Competitive compensation and flexible work culture offered.

Benefits

Competitive compensation
Flexible work culture
Growth opportunities

Qualifications

  • Strong experience with Kubernetes, including GPU scheduling and Helm.
  • Proficiency in Infrastructure as Code tools.
  • Familiarity with cloud platforms and AI services.

Responsibilities

  • Design and manage cloud infrastructure optimized for AI/ML workloads.
  • Deploy and maintain Kubernetes clusters tailored for GPU scheduling.
  • Build CI/CD pipelines for AI model training and deployment.

Skills

Kubernetes
Infrastructure as Code tools (Terraform, Pulumi)
Cloud platforms (AWS, Azure, GCP)
CI/CD tools (GitHub Actions, GitLab CI)
Scripting skills (Python, Bash, Go)
ML model lifecycle and data pipeline orchestration
Communication and collaboration skills

Tools

Terraform
Pulumi
Prometheus
Grafana
OpenTelemetry
Job description
Overview

About the Role

Raydian Cloud is seeking a forward-thinking DevOps Engineer to help build and scale infrastructure that powers cutting-edge AI workloads. You’ll work at the intersection of cloud-native technologies and Artificial Intelligence operations (AIOps), enabling high-performance, secure, and automated environments for AI development and deployment. Your expertise in Infrastructure as Code and Kubernetes will be critical in supporting scalable AI pipelines and platform services.

Responsibilities
  • Design and manage cloud infrastructure optimized for AI/ML workloads using Infrastructure as Code (Terraform, Pulumi, etc.)
  • Deploy and maintain Kubernetes clusters tailored for GPU scheduling, distributed training, and inference workloads
  • Build CI/CD pipelines for AI model training, validation, and deployment across environments
  • Collaborate with data scientists and ML engineers to streamline model lifecycle management
  • Implement observability and monitoring for AI services (e.g., Prometheus, Grafana, OpenTelemetry)
  • Ensure infrastructure security, compliance, and cost-efficiency in multi-tenant AI environments
  • Automate provisioning of AI-specific resources (e.g., GPU nodes, storage volumes, feature stores)
  • Document infrastructure patterns, DevOps workflows, and platform architecture
Required Skills & Qualifications
  • Strong experience with Kubernetes, including GPU scheduling and Helm
  • Proficiency in Infrastructure as Code tools (Terraform, Pulumi, etc.)
  • Familiarity with cloud platforms (AWS, Azure, GCP) and AI services (e.g., SageMaker, Vertex AI)
  • Experience with CI/CD tools (GitHub Actions, GitLab CI, Argo Workflows)
  • Scripting skills in Python, Bash, or Go
  • Understanding of ML model lifecycle and data pipeline orchestration
  • Excellent communication and collaboration skills across technical and business teams
Nice to Have
  • Experience with Kubeflow, MLflow, or similar MLOps frameworks
  • Knowledge of containerized AI workloads (e.g., TensorFlow Serving, Triton Inference Server)
  • Familiarity with service mesh technologies (Istio, Linkerd) in AI microservices
  • Certifications in Kubernetes or cloud platforms (CKA, AWS DevOps Engineer)
Why Join Raydian Cloud?
  • Shape the future of AI infrastructure and platform services
  • Work with a visionary team blending deep tech and strategic execution
  • Influence architecture decisions in a fast-moving AI startup environment
  • Competitive compensation, flexible work culture, and growth opportunities
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.