Enable job alerts via email!

Principal Machine Learning Engineer

GRABTAXI HOLDINGS PTE. LTD.

Singapore

On-site

SGD 100,000 - 160,000

Full time

Today

Be an early applicant

Job summary

A leading transportation company in Singapore seeks a Principal Machine Learning Engineer to build and optimize AI infrastructure. You will lead projects, mentor teams, and design large-scale systems using Kubernetes and Ray. Candidates should have extensive experience in AI/ML infrastructure and programming skills in Python. This role requires on-site presence.

Qualifications

6+ years of experience building large-scale AI/ML or distributed systems infrastructure.
At least 2 years in a technical leadership capacity.
Proficiency in Python and one or more system-level languages (e.g., Go, Rust, C++).

Responsibilities

Design and scale the next generation of AI infrastructure.
Drive initiatives to optimize GPU/CPU utilization for AI workloads.
Provide technical mentorship and foster a culture of excellence.

Skills

Deep Infrastructure & Distributed Systems Expertise

Cloud & Compute Optimization

Programming & Engineering Excellence

Python

Cloud infrastructure management

Technical leadership

Tools

Ray

Kubernetes

Spark

Job Description

Get to Know the Team

The AI Platform team empowers Grab teams to leverage advanced AI seamlessly and effectively. We're building cutting‑edge tools and infrastructure to democratize AI capabilities, accelerate innovation, and enhance Grab's products and services at scale.

Get to Know the Role

As a Principal Machine Learning Engineer focused on AI Infrastructure, you will shape the backbone of Grab's AI ecosystem. You will design and evolve scalable platforms for model training, serving, and evaluation—anchored on technologies like Ray and Kubernetes—that enable thousands of engineers and data scientists to innovate safely and efficiently. Your role is pivotal in ensuring Grab's AI foundation is cost‑efficient, resilient, and future‑ready.

You will report to the Head of Engineering.

This role will be onsite at Grab office.

The Critical Tasks You Will Perform

Independently Lead and Execute Demonstrate strength as a technical lead by taking full responsibility for projects conception, planning and execution.
Architect the Future of AI Infrastructure Design and scale the next generation of distributed systems for model training, inference, and experimentation on Kubernetes and Ray.
Build Platforms for Scale Develop core abstractions, APIs, and services that make AI experimentation, deployment, and monitoring seamless across Grab.
Enable Cost‑Efficient AI at Scale Drive initiatives to optimize GPU/CPU utilization, storage, and networking for large‑scale AI workloads, driving significant efficiency gains.
Integrate Research with Production Systems Translate cutting‑edge distributed training, scheduling, and serving techniques into production‑ready systems that can handle Grab's scale.
Influence AI Platform Strategy Partner with engineering and product leadership to set direction for Grab's AI infrastructure roadmap, balancing long‑term vision with practical execution.
Mentor and Inspire Provide deep technical mentorship, foster platform‑thinking, and cultivate a culture of excellence across engineering and research teams.

Qualifications

What Essential Skills You Will Need

Experience 6+ years of experience building large‑scale AI/ML or distributed systems infrastructure.
At least 2 years in a technical leadership capacity, driving architectural decisions and mentoring teams.
Deep Infrastructure & Distributed Systems Expertise Hands‑on experience with Ray (Ray Train, Ray Serve, Ray Tune) and distributed data processing frameworks (e.g., Dask, Spark).
Expertise in Kubernetes, container orchestration, autoscaling, and cloud‑native architectures.
Systems & Platform Engineering Experience designing and delivering developer platforms that abstract away complexity while ensuring scale.
Background in APIs, microservices, observability, and CI/CD best practices.
Cloud & Compute Optimization Experience running large‑scale AI/ML workloads on cloud infrastructure (AWS/GCP/Azure).
Expertise in GPU scheduling, heterogeneous clusters, and cost‑optimization strategies.
Programming & Engineering Excellence Proficiency in Python and one or more system‑level languages (e.g., Go, Rust, C++).
Strong engineering fundamentals in concurrency, networking, storage, and system performance.
Strategic Visionary & Leadership Strategic AI Infrastructure Leadership: Develops roadmaps that align AI infrastructure with core business priorities.
Platform Empowerment: Passionate about building platforms that accelerate impact for engineers, researchers, and product teams.
Influence & Mentorship: Influence technical direction across diverse teams and a strong track record of mentoring engineers.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.