Enable job alerts via email!

Cloud Infrastructure Engineer

INFINITY LINKS PTE. LTD.

Singapore

On-site

SGD 80,000 - 120,000

Full time

Today
Be an early applicant

Job summary

A cloud services provider in Singapore is looking for a Cloud Infrastructure Engineer to design, deploy, and maintain scalable GPU infrastructure. The ideal candidate will have experience in DevOps and a solid background in managing Kubernetes and Linux systems. Responsibilities include automating resource provisioning, managing container orchestration, and ensuring infrastructure security.

Qualifications

  • 3–7 years of experience in DevOps, Site Reliability, or Infrastructure Engineering roles.
  • Deep experience managing Linux systems in production environments.
  • Experience deploying and managing Kubernetes clusters at scale.

Responsibilities

  • Design, deploy, and maintain scalable cloud infrastructure for GPU workloads.
  • Automate provisioning of compute resources across environments.
  • Manage container orchestration platforms for GPU clusters.

Skills

DevOps experience
Linux systems management
Kubernetes management
Scripting (Bash, Python, Go)
Networking knowledge

Tools

Terraform
Ansible
Docker
Prometheus
Grafana
ELK
GitLab CI
Jenkins
Job description
Overview

IXL Cloud enables businesses, start-ups, researchers, and developers to train, deploy, and scale their AI systems with unmatched performance and flexibility.

We accelerate their AI journey by delivering leading GPU infrastructure, seamless scalability, and AI-first operational support—helping bring advanced AI applications to fruition without the complexity of managing underlying compute architecture.

Responsibilities

As a Cloud Infrastructure Engineer, you will:

  • Design, deploy, and maintain scalable cloud infrastructure for GPU workloads using tools like Terraform, Ansible, and Kubernetes.
  • Automate provisioning of compute resources across bare-metal and cloud environments.
  • Manage container orchestration platforms (Kubernetes, Docker) for multi-tenant GPU cluster environments.
  • Monitor infrastructure performance, uptime, and system health using observability tools (Prometheus, Grafana, ELK, etc.).
  • Maintain and optimize storage, networking, and load balancing layers for high-throughput AI workloads.
  • Implement CI/CD pipelines for both infrastructure and application-level changes.
  • Collaborate with software engineers, platform teams, and AI researchers to understand workload needs and optimize system performance accordingly.
  • Ensure infrastructure security, including secrets management, RBAC, and compliance with best practices.
  • Troubleshoot and resolve infrastructure incidents, scaling issues, and performance bottlenecks.
  • Support hardware provisioning, firmware updates, and GPU driver/CUDA installations.
Qualifications
  • 3–7 years of experience in DevOps, Site Reliability, or Infrastructure Engineering roles.
  • Deep experience managing Linux systems in production environments.
  • Experience deploying and managing Kubernetes clusters at scale (bare metal or cloud-native).
  • Familiarity with GPU drivers (NVIDIA, CUDA) and workload optimization is a plus.
  • Proficiency in scripting languages (Bash, Python, Go, etc.).
  • Strong understanding of networking, firewalls, and storage systems in distributed compute environments.
  • Experience with CI/CD tools such as GitLab CI, ArgoCD, Jenkins, or Flux.
  • Excellent communication and documentation skills.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.