Enable job alerts via email!

Senior Machine Learning Operator

CDG Zig

Singapore

On-site

SGD 70,000 - 90,000

Full time

Today
Be an early applicant

Job summary

A technology company based in Singapore is looking for a Senior Machine Learning Operator (MLOps) to manage their ML infrastructure. You will be responsible for designing CI/CD pipelines, managing cloud resources, and automating the ML lifecycle to enhance operational efficiency. Ideal candidates will have experience in MLOps, a strong DevOps foundation, and excellent problem-solving abilities.

Qualifications

  • Min 3 to 5 years of experience in MLOps, DevOps or a related field.
  • Strong command of DevOps practices.
  • Experience with monitoring and logging tools like Grafana or Prometheus.

Responsibilities

  • Design and maintain robust CI/CD pipelines for machine learning.
  • Manage cloud resources using Infrastructure as Code tools.
  • Implement monitoring solutions for model performance and system health.

Skills

CI/CD pipelines management
DevOps practices
Containerization
Orchestration technologies
Scripting proficiency
MLOps tools

Education

Bachelor’s degree in Computer Science or related field

Tools

Terraform
Docker
Kubernetes
GitHub Actions
AWS
GCP
Azure
MLflow
Apache Airflow
Prometheus
Grafana
Job description

We are seeking a skilled Senior Machine Learning Operator (MLOps) with a strong DevOps foundation to build and manage the infrastructure that powers our entire machine learning ecosystem. You will be responsible for automating the ML lifecycle, ensuring our models are deployed, monitored, and scaled with maximum reliability and efficiency. Your work will be critical in enabling our data scientists and ML engineers to innovate faster by providing a robust, scalable, and automated platform.

Job Responsibilities:
  • Design, build, and maintain robust, scalable CI/CD pipelines specifically for machine learning, automating data validation, model training, deployment, and testing.
  • Take full ownership of the ML infrastructure, managing and provisioning cloud resources (AWS, GCP, or Azure) using Infrastructure as Code (IaC) tools like Terraform or CloudFormation.
  • Develop and manage our containerization and orchestration strategy for ML services using Docker and Kubernetes (or platforms like Kubeflow).
  • Implement comprehensive monitoring solutions to track model performance, data/concept drift, and system health, with automated alerting and response mechanisms.
  • Establish and manage the central model registry and feature store, enforcing best practices for model versioning, lineage, and governance.
  • Automate and optimize ML workflows by integrating disparate systems, APIs, and tooling to ensure seamless operations from development to production.
  • Collaborate with Data Science, ML Engineering, and SRE teams to define and evangelize MLOps best practices across the organization.
  • Any ad hoc duties as assigned.
Job Requirements:
  • Bachelor’s degree in Computer Science, Computer Engineering, Software Engineering, or a related technical field.
  • Min 3 to 5 years of experience in MLOps, DevOps or a related field.
  • Strong command of DevOps practices and extensive experience building and managing CI/CD pipelines (e.g., GitHub Actions, Jenkins, GitLab CI).
  • Expertise in a major cloud platform (AWS, GCP, or Azure), including its native data, AI/ML, compute, and storage solutions.
  • Proven, hands‑on experience with Infrastructure as Code (IaC) tools, particularly Terraform or CloudFormation.
  • Practical experience with containerization (Docker) and orchestration technologies (Kubernetes) for deploying and scaling applications.
  • Strong scripting proficiency (e.g., Python, Bash) for automation and building tooling.
  • Direct experience with MLOps‑specific tools such as MLflow, Kubeflow, DVC, Seldon Core, or cloud‑native equivalents (e.g., Amazon SageMaker, Vertex AI).
  • Familiarity with the machine learning lifecycle and its unique challenges (e.g., experiment tracking, data versioning, model monitoring).
  • Experience supporting ML systems in the ride‑hailing industry, particularly around dynamic pricing, would be a strong plus.
  • Proficiency with workflow orchestration tools like Apache Airflow is highly desirable.
  • Experience with monitoring and logging tools like Guance, Prometheus, Grafana, or the ELK stack.
  • A proactive and analytical approach to problem‑solving, with a systems‑thinking mindset.
  • Strong ownership mentality with the ability to manage critical infrastructure and platforms independently.
  • Excellent communication skills to collaborate effectively with cross‑functional technical teams.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.