Job Search and Career Advice Platform

Enable job alerts via email!

PyTorch MLOps Engineer — Bare-Metal Infra

Second Talent

Singapore

On-site

SGD 103,000 - 156,000

Full time

15 days ago

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A technology solutions firm is looking for an MLOps Engineer specializing in PyTorch to manage the on-premise infrastructure for advanced training workloads. The role involves architecting training and inference pipelines, ensuring high-quality code, and troubleshooting complex issues. Ideal candidates will have expert knowledge of PyTorch, strong experience in C++ and Python, and a solid background in computer science. This full-time position is based in Singapore and will require a proactive approach to optimizing compute workloads.

Qualifications

  • Expert-level knowledge of PyTorch, including DDP, mixed precision training, and TorchScript.
  • Advanced programming skills in both C++ and Python.
  • Solid background in computer science fundamentals including data structures and algorithms.
  • Hands-on experience debugging and tuning bare-metal servers and Linux administration.
  • Understanding of low-level networking, distributed training protocols.
  • Proven track record of building reliable, reproducible pipelines.

Responsibilities

  • Architect, build, and maintain end-to-end training and inference pipelines using PyTorch.
  • Develop high-quality tooling in Python and C++ to support model training lifecycle.
  • Take ownership of core training codebase for clarity and reproducibility.
  • Design workflows for checkpointing, resuming jobs, and model versioning.
  • Optimize compute workloads for bare-metal environments.
  • Troubleshoot low-level issues, including networking bottlenecks and hardware faults.

Skills

Expert-level knowledge of PyTorch
Advanced programming in C++
Advanced programming in Python
Computer science fundamentals
Debugging bare-metal servers
Low-level networking knowledge
Building reproducible pipelines
Job schedulers experience

Tools

PyTorch
Linux
SLURM
Job description
A technology solutions firm is looking for an MLOps Engineer specializing in PyTorch to manage the on-premise infrastructure for advanced training workloads. The role involves architecting training and inference pipelines, ensuring high-quality code, and troubleshooting complex issues. Ideal candidates will have expert knowledge of PyTorch, strong experience in C++ and Python, and a solid background in computer science. This full-time position is based in Singapore and will require a proactive approach to optimizing compute workloads.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.