Enable job alerts via email!

Cloud Infrastructure Engineer

INFINITY LINKS PTE. LTD.

Singapore

On-site

SGD 70,000 - 95,000

Full time

Today
Be an early applicant

Job summary

A technology solutions provider in Singapore seeks a skilled Cloud Infrastructure Engineer to design and maintain scalable GPU infrastructure. The ideal candidate should have 3–7 years’ experience in DevOps and proficiency with tools like Kubernetes and Terraform. Responsibilities include automating provisioning, ensuring security, and optimizing performance in multi-tenant environments. Join a dynamic team dedicated to advancing AI applications.

Qualifications

  • 3–7 years of experience in DevOps, Site Reliability, or Infrastructure Engineering roles.
  • Deep experience managing Linux systems in production environments.
  • Experience deploying and managing Kubernetes clusters at scale.

Responsibilities

  • Design, deploy, and maintain scalable cloud infrastructure for GPU workloads.
  • Automate provisioning of compute resources across bare-metal and cloud environments.
  • Monitor infrastructure performance, uptime, and system health.

Skills

DevOps
Site Reliability
Infrastructure Engineering
Linux Systems
Kubernetes
Scripting (Bash, Python, Go)
Networking

Tools

Terraform
Ansible
Docker
Prometheus
Grafana
ELK
GitLab CI
ArgoCD
Jenkins
Flux
Job description
Overview

IXL Cloud enables businesses, start-ups, researchers, and developers to train, deploy, and scale their AI systems with unmatched performance and flexibility.

We accelerate their AI journey by delivering leading GPU infrastructure, seamless scalability, and AI-first operational support—helping bring advanced AI applications to fruition without the complexity of managing underlying compute architecture.

Responsibilities

As a Cloud Infrastructure Engineer, you will:

  • Design, deploy, and maintain scalable cloud infrastructure for GPU workloads using tools like Terraform, Ansible, and Kubernetes.
  • Automate provisioning of compute resources across bare-metal and cloud environments.
  • Manage container orchestration platforms (Kubernetes, Docker) for multi-tenant GPU cluster environments.
  • Monitor infrastructure performance, uptime, and system health using observability tools (Prometheus, Grafana, ELK, etc.).
  • Maintain and optimize storage, networking, and load balancing layers for high-throughput AI workloads.
  • Implement CI/CD pipelines for both infrastructure and application-level changes.
  • Collaborate with software engineers, platform teams, and AI researchers to understand workload needs and optimize system performance accordingly.
  • Ensure infrastructure security, including secrets management, RBAC, and compliance with best practices.
  • Troubleshoot and resolve infrastructure incidents, scaling issues, and performance bottlenecks.
  • Support hardware provisioning, firmware updates, and GPU driver/CUDA installations.
Qualifications
  • 3–7 years of experience in DevOps, Site Reliability, or Infrastructure Engineering roles.
  • Deep experience managing Linux systems in production environments.
  • Experience deploying and managing Kubernetes clusters at scale (bare metal or cloud-native).
  • Familiarity with GPU drivers (NVIDIA, CUDA) and workload optimization is a plus.
  • Proficiency in scripting languages (Bash, Python, Go, etc.).
  • Strong understanding of networking, firewalls, and storage systems in distributed compute environments.
  • Experience with CI/CD tools such as GitLab CI, ArgoCD, Jenkins, or Flux.
  • Excellent communication and documentation skills.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.