Enable job alerts via email!

Senior Machine Learning Operator

CDG ZIG PTE. LTD.

Singapore

On-site

SGD 20,000 - 60,000

Full time

Today

Be an early applicant

Job summary

A leading tech firm in Singapore is seeking a skilled Senior Machine Learning Operator (MLOps) to manage the infrastructure for its machine learning ecosystem. Responsibilities include building CI/CD pipelines, managing cloud resources, and optimizing ML workflows. Ideal candidates will have 3-5 years of experience, a strong DevOps foundation, and expertise in major cloud platforms. This role is vital for enabling faster innovation in data science and ML engineering.

Qualifications

3 to 5 years of experience in MLOps, DevOps, or related field.
Strong command of DevOps practices and CI/CD pipelines.
Expertise in AWS, GCP, or Azure and their respective solutions.

Responsibilities

Build and maintain CI/CD pipelines for machine learning.
Manage and provision cloud resources using IaC.
Implement monitoring solutions to track model performance.

Skills

MLOps

DevOps

CI/CD pipelines

Containerization

Scripting

Monitoring tools

Cloud platforms

MLOps tools proficiency (MLflow, Kubeflow)

Workflow orchestration (Apache Airflow)

Monitoring tools (Prometheus, Grafana)

Education

Bachelor’s degree in Computer Science or related field

Tools

Terraform

Docker

Kubernetes

Machine Learning tools (e.g. MLflow, Kubeflow)

Azure

Terraform

MLflow

Kubeflow

Apache Airflow

We are seeking a skilled Senior Machine Learning Operator (MLOps) with a strong DevOps foundation to build and manage the infrastructure that powers our entire machine learning ecosystem. You will be responsible for automating the ML lifecycle, ensuring our models are deployed, monitored, and scaled with maximum reliability and efficiency. Your work will be critical in enabling our data scientists and ML engineers to innovate faster by providing a robust, scalable, and automated platform.

Job Responsibilities

Design, build, and maintain robust, scalable CI/CD pipelines specifically for machine learning, automating data validation, model training, deployment, and testing.
Take full ownership of the ML infrastructure, managing and provisioning cloud resources (AWS, GCP, or Azure) using Infrastructure as Code (IaC) tools like Terraform or CloudFormation.
Develop and manage our containerization and orchestration strategy for ML services using Docker and Kubernetes (or platforms like Kubeflow).
Implement comprehensive monitoring solutions to track model performance, data/concept drift, and system health, with automated alerting and response mechanisms.
Establish and manage the central model registry and feature store, enforcing best practices for model versioning, lineage, and governance.
Automate and optimize ML workflows by integrating disparate systems, APIs, and tooling to ensure seamless operations from development to production.
Collaborate with Data Science, ML Engineering, and SRE teams to define and evangelize MLOps best practices across the organization.
Any ad hoc duties as assigned.

Job Requirements

Bachelor’s degree in Computer Science, Computer Engineering, Software Engineering, or a related technical field.
Min 3 to 5 years of experience in MLOps, DevOps or a related field.
Strong command of DevOps practices and extensive experience building and managing CI/CD pipelines (e.g., GitHub Actions, Jenkins, GitLab CI).
Expertise in a major cloud platform (AWS, GCP, or Azure), including its native data, AI/ML, compute, and storage solutions.
Proven, hands‑on experience with Infrastructure as Code (IaC) tools, particularly Terraform or CloudFormation.
Practical experience with containerization (Docker) and orchestration technologies (Kubernetes) for deploying and scaling applications.
Strong scripting proficiency (e.g., Python, Bash) for automation and building tooling.
Direct experience with MLOps‑specific tools such as MLflow, Kubeflow, DVC, Seldon Core, or cloud‑native equivalents (e.g., Amazon SageMaker, Vertex AI).
Familiarity with the machine learning lifecycle and its unique challenges (e.g., experiment tracking, data versioning, model monitoring).
Experience supporting ML systems in the ride‑hailing industry, particularly around dynamic pricing, would be a strong plus.
Proficiency with workflow orchestration tools like Apache Airflow is highly desirable.
Experience with monitoring and logging tools like Guance, Prometheus, Grafana, or the ELK stack.
A proactive and analytical approach to problem‑solving, with a systems‑thinking mindset.
Strong ownership mentality with the ability to manage critical infrastructure and platforms independently.
Excellent communication skills to collaborate effectively with cross‑functional technical teams.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.