Enable job alerts via email!

Research Engineer - Data Engine

Menlo Research Pte Ltd

Singapore

On-site

SGD 80,000 - 100,000

Full time

Today
Be an early applicant

Job summary

A leading robotics research firm in Singapore is seeking a Data Infrastructure Engineer to architect and maintain the data platform supporting their robot learning stack. You'll ensure high-quality data is captured and made available for training large-scale models. Candidates should have skills in distributed systems and proficiency in programming languages like Python, Go, or C++. Experience with Kubernetes and data pipelines is also required.

Qualifications

  • Strong background in distributed systems, data infrastructure, or robotics data pipelines.
  • Experience designing metrics dashboards and automating feedback loops between data and model performance.

Responsibilities

  • Design and maintain ETL pipelines to collect, synchronize, and process data from distributed robot fleets.
  • Implement intelligent triggers to capture the most informative episodes for learning.
  • Develop multi-modal data storage and query systems for various data types.
  • Automate annotation and labeling pipelines using AI-assisted tools.
  • Integrate on-device logging with cloud pipelines for seamless dataset creation.
  • Provide training-ready datasets to autonomy teams and monitor data quality at scale.

Skills

Distributed systems
Data infrastructure
Robotics data pipelines
Python
Go
C++
Kubernetes
Airflow
NATS

Tools

Cloud integration (S3, NFS, gRPC)
Job description
About Us

Our robots generate massive multi-modal data streams, from video, audio, proprioception, to control trajectories. To learn from this at scale, we're building a robot data engine that turns real world experiences into structured training data for our foundation models. This role sits at the core of that system, creating the data and compute infrastructure that makes large-scale embodied learning possible.

Role Overview

You will architect and maintain the data platform powering our robot learning stack, ensuring high-quality fleet data is captured, synchronized, labeled, and available for large-scale training. You will work across edge devices, on-prem clusters, and cloud infrastructure to build robust, automated, and scalable data flows.

Responsibilities
  • Design and maintain ETL pipelines to collect, synchronize, and process data from distributed robot fleets.
  • Implement intelligent triggers to capture the most informative episodes for learning (e.g., manipulation failures, locomotion drift).
  • Develop multi-modal data storage and query systems for video, audio, proprioception, and action data.
  • Automate annotation and labeling pipelines using AI-assisted tools.
  • Integrate on-device logging with cloud pipelines for seamless dataset creation.
  • Provide training-ready datasets to autonomy teams and monitor data quality at scale.
Preferred Qualifications
  • Strong background in distributed systems, data infrastructure, or robotics data pipelines.
  • Proficiency in Python, Go, or C++.
  • Experience with Kubernetes, Airflow, or NATS.
  • Understanding of multimodal data handling and large-scale dataset design.
  • Familiarity with robotics telemetry, on-robot logging, and cloud integration (S3, NFS, gRPC).
  • Experience designing metrics dashboards and automating feedback loops between data and model performance.
Bonus Skills
  • Built or contributed to robotic fleet data systems.
  • Experience with foundation model data curation (tokenization, sharding, filtering).
  • Strong interest in enabling embodied AI through scalable data infrastructure.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.