Enable job alerts via email!

Senior ML Platform Engineer

AIRWALLEX (SINGAPORE) PTE. LTD.

Singapore

On-site

SGD 100,000 - 120,000

Full time

Yesterday

Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading fintech company in Singapore is seeking an experienced ML Platform Engineer to design, implement, and maintain their MLOps infrastructure. The ideal candidate will have extensive experience in backend software development, particularly in AI and MLOps. Responsibilities include building low-latency serving infrastructure and implementing CI/CD pipelines. A degree in Computer Science or related fields is required. Join the team to contribute to cutting-edge AI solutions in a vibrant environment.

Qualifications

5+ years in backend software development, with at least 2+ years focus on AI/ML Platform.
Deep expertise in MLOps practices, including automated deployment pipelines.
Proven experience designing and implementing low-latency model serving solutions.

Responsibilities

Design, build, and maintain the end-to-end MLOps platform using Kubernetes.
Implement and optimize CI/CD pipelines to automate model training and deployment.
Ensure platform security and manage secrets for sensitive data.

Skills

Backend software development

AI/ML Platform

MLOps practices

Python

Communication skills

High-quality code writing

Education

Degree in Computer Science or Mathematics

Tools

Kubernetes

Terraform

CI/CD tools

Argo

Kubeflow

About the Team:

As part of our brand-new AI team, we're building cutting-edge AI and its platforms to transform the way we provide support and automation solutions for our internal teams and external consumers. We aim to unlock any knowledge that exists within Airwallex to power use cases across our organization. The team is crucial in driving innovation and setting the standard for future developments in this exciting new field.

Role & Project Scope:

We are seeking a skilled and passionate ML Platform Engineer to join our team and build the next generation of our machine learning infrastructure. You will be responsible for designing, implementing, and maintaining the core MLOps platform that empowers our Data Science and ML Engineering teams to rapidly develop, deploy, and monitor high-performance models at scale.

Crucially, you will contribute to the evolution of our unified AI Platform, covering both traditional ML and our growing LLM (Large Language Model) platform.

What You'll Do:

Platform Development: Design, build, and maintain the end-to-end MLOps platform using Kubernetes and Cloud Services.
Infrastructure as Code (IaC): Use Terraform or similar tools to manage, provision, and scale all ML-related infrastructure securely and efficiently.
Pipeline Automation: Implement and optimize CI/CD/CT (Continuous Integration, Delivery, Training) pipelines to automate model training, testing, packaging, and deployment using tools like Argo and Kubeflow Pipelines.
Serving Infrastructure: Build highly available, low-latency, and high-throughput model serving infrastructure.
Observability: Implement robust monitoring, alerting, and logging solutions to track infrastructure health, model performance, and data/model drift.
Tooling & Support: Evaluate, integrate, and support ML tools such as Feature Stores and distributed model training pipelines.
Security & Compliance: Ensure platform security, implement RBAC (Role-Based Access Control), and manage secrets for sensitive data and production environments.
Collaboration: Work closely with Data Scientists and ML Engineers to understand their needs and provide technical guidance on best practices for scaling their models.

What You Need to Have:

5+ years in backend software development, with at least 2+ years focus on AI/ML Platform or MLOps infrastructure.
Deep expertise in MLOps practices, including automated deployment pipelines, model optimization, and production lifecycle management.
Proven experience designing and implementing low-latency model serving solutions.
Proficiency in Python.
Skill in writing high-quality, maintainable code.
Experience in design and development of large-scale distributed, high concurrency, low-latency inference, high availability systems.
Excellent communication and mentoring abilities.
A relevant degree in Computer Science, Mathematics or related fields.

Preferred qualifications:

Familiarity with distributed compute/training frameworks (e.g., Ray, Spark).
Experience configuring and managing ML workflows on cloud infrastructure (e.g., Kubernetes, Kubeflow).
Working knowledge of LLM serving optimization (e.g., vLLM, TGI, Triton) and GPU resource management

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Top companies

Top positions