Enable job alerts via email!

Research Engineer - Data Engine

Menlo Research Pte Ltd

Singapore

On-site

SGD 80,000 - 100,000

Full time

Today

Be an early applicant

Job summary

A leading robotics research firm in Singapore is seeking a Data Infrastructure Engineer to architect and maintain the data platform supporting their robot learning stack. You'll ensure high-quality data is captured and made available for training large-scale models. Candidates should have skills in distributed systems and proficiency in programming languages like Python, Go, or C++. Experience with Kubernetes and data pipelines is also required.

Qualifications

Strong background in distributed systems, data infrastructure, or robotics data pipelines.
Experience designing metrics dashboards and automating feedback loops between data and model performance.

Responsibilities

Design and maintain ETL pipelines to collect, synchronize, and process data from distributed robot fleets.
Implement intelligent triggers to capture the most informative episodes for learning.
Develop multi-modal data storage and query systems for various data types.
Automate annotation and labeling pipelines using AI-assisted tools.
Integrate on-device logging with cloud pipelines for seamless dataset creation.
Provide training-ready datasets to autonomy teams and monitor data quality at scale.

Skills

Distributed systems

Data infrastructure

Robotics data pipelines

Python

C++

Kubernetes

Airflow

NATS

Tools

Cloud integration (S3, NFS, gRPC)

About Us

Our robots generate massive multi-modal data streams, from video, audio, proprioception, to control trajectories. To learn from this at scale, we're building a robot data engine that turns real world experiences into structured training data for our foundation models. This role sits at the core of that system, creating the data and compute infrastructure that makes large-scale embodied learning possible.

Role Overview

You will architect and maintain the data platform powering our robot learning stack, ensuring high-quality fleet data is captured, synchronized, labeled, and available for large-scale training. You will work across edge devices, on-prem clusters, and cloud infrastructure to build robust, automated, and scalable data flows.

Responsibilities

Design and maintain ETL pipelines to collect, synchronize, and process data from distributed robot fleets.
Implement intelligent triggers to capture the most informative episodes for learning (e.g., manipulation failures, locomotion drift).
Develop multi-modal data storage and query systems for video, audio, proprioception, and action data.
Automate annotation and labeling pipelines using AI-assisted tools.
Integrate on-device logging with cloud pipelines for seamless dataset creation.
Provide training-ready datasets to autonomy teams and monitor data quality at scale.

Preferred Qualifications

Strong background in distributed systems, data infrastructure, or robotics data pipelines.
Proficiency in Python, Go, or C++.
Experience with Kubernetes, Airflow, or NATS.
Understanding of multimodal data handling and large-scale dataset design.
Familiarity with robotics telemetry, on-robot logging, and cloud integration (S3, NFS, gRPC).
Experience designing metrics dashboards and automating feedback loops between data and model performance.

Bonus Skills

Built or contributed to robotic fleet data systems.
Experience with foundation model data curation (tokenization, sharding, filtering).
Strong interest in enabling embodied AI through scalable data infrastructure.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.