Job Search and Career Advice Platform

Enable job alerts via email!

Lead AI Infrastructure Engineer

ThoughtWorks

Singapore

On-site

SGD 80,000 - 100,000

Full time

2 days ago
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading technology consultancy in Singapore is seeking a skilled infrastructure engineer to design and manage GPU-based AI systems. In this role, you will architect scalable systems using advanced tools and mentor teams in best practices. Ideal candidates will have expertise in GPU tech, orchestration frameworks, and a passion for innovative AI solutions. Join an environment that supports professional growth and enables you to make impactful contributions to clients' AI strategies.

Benefits

Career development programs
Collaborative work culture

Qualifications

  • Expertise in GPU-based infrastructure (H100, GB200) for AI applications.
  • Strong knowledge of orchestration frameworks like Kubernetes and Ray.
  • Experience with building resilient AI systems and monitoring tools.

Responsibilities

  • Design GPU infrastructure for cloud and on-site environments.
  • Automate deployment and provisioning of AI resources.
  • Lead technical engagements with clients to align AI goals.

Skills

GPU-based infrastructure expertise
Kubernetes knowledge
Involvement in AI systems performance tuning
Infrastructure automation proficiency
Experience with monitoring tools

Tools

Terraform
Helm
CI / CD pipelines
Prometheus
Grafana
OpenTelemetry
Job description
Job responsibilities
  • Design and operate GPU-based infrastructure (, NVIDIA GB200, H100) across cloud and self-hosted environments.
  • Architect scalable inference platforms that support real-time and batch serving with high availability, load balancing, and fault tolerance.
  • Integrate inference workloads with orchestration frameworks such as Kubernetes, Slurm, and Ray, as well as observability stacks like Prometheus, Grafana, and OpenTelemetry.
  • Automate infrastructure provisioning and deployment using Terraform, Helm, and CI / CD pipelines.
  • Collaborate with ML engineers to co-design systems optimized for low-latency serving, continuous batching, and advanced inference optimization techniques (quantization, distillation, pruning, KV caching).
  • Lead client engagements by shaping technical roadmaps that align AI infrastructure with business objectives, ensuring compliance, scalability, and performance.
  • Champion DevOps and agile practices to accelerate delivery while maintaining reliability, quality, and resilience.
  • Mentor and guide teams in best practices for AI infrastructure engineering, fostering a culture of technical excellence and innovation.
Job Qualifications
Technical Skills
  • Expertise in GPU-based infrastructure for AI (H100, GB200, or similar), including scaling across clusters.
  • Strong knowledge of orchestration frameworks : Kubernetes, Ray, Slurm.
  • Experience with inference-serving frameworks (vLLM, NVIDIA Triton, DeepSpeed).
  • Proficiency in infrastructure automation (Terraform, Helm, CI / CD pipelines).
  • Experience building resilient, high-throughput, low-latency systems for AI background in observability and monitoring : Prometheus, Grafana, OpenTelemetry.
  • Familiarity with security, compliance, and governance concerns in AI infrastructure (data sovereignty, air-gapped deployments, audit logging).
  • Solid understanding of DevOps, cloud-native architectures, and Infrastructure as Code.
  • Exposure to multi-cloud and hybrid deployments (AWS, GCP, Azure, sovereign / private cloud).
  • Experience with benchmarking and cost / performance tuning for AI systems.
  • Background in MLOps or collaboration with ML teams on large-scale AI production systems.
Professional Skills
  • Proven ability to partner with senior client stakeholders (CTO, CIO, COO) and translate technical strategy into business outcomes.
  • Skilled at leading multi-disciplinary teams and building trust across diverse technical and business functions.
  • Strong communication skills, with the ability to explain complex AI infrastructure concepts to both technical and non-technical audiences.
  • Comfortable navigating uncertainty, making pragmatic decisions, and adapting quickly to evolving technologies.
  • Passionate about creating scalable, sustainable, and high-impact solutions that help transform industries with AI.
Other things to know
Learning & Development

There is no one-size-fits-all career path at Thoughtworks : however you want to develop your career is entirely up to you. But we also balance autonomy with the strength of our cultivation culture. This means your career is supported by interactive tools, numerous development programs and teammates who want to help you grow. We see value in helping each other be our best and that extends to empowering our employees in their career journeys.

Job Details

Country : Singapore

City : Singapore

Date Posted : 11-13-2025

Industry : Information Technology

Employment Type : Employee

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.