Enable job alerts via email!

Senior Backend Engineer, Data Mining

MOTIONAL SINGAPORE PTE. LIMITED

Singapore

On-site

SGD 100,000 - 150,000

Full time

Today

Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading autonomous driving technology company in Singapore is seeking a Senior Backend Engineer to architect and own the production systems for multimodal data mining. Applicants should have over 6 years of experience in building large-scale distributed systems, deep expertise in Ray or Spark, and a strong proficiency in Python. The role offers an opportunity to directly impact the quality and speed of data insights in autonomous driving applications.

Qualifications

6+ years of experience in large-scale distributed systems.
Experience designing and optimizing production data pipelines.
Strong proficiency in Python and software engineering practices.

Responsibilities

Architect high-throughput, low-latency backend systems.
Own the complete data journey for multimodal pipelines.
Build monitoring and alerting for data preprocessing.

Skills

Python proficiency

Distributed data processing

SQL and data manipulation

Cost optimization

Data manipulation skills

Production data pipeline optimization

Education

BS in Computer Science or related field

Tools

AWS (S3, EC2, EKS, EMR)

Ray

Spark

Roles & Responsibilities

Mission Summary :

At Motional, we’re transforming how autonomous vehicles discover critical intelligence hidden within petabytes of multimodal sensor data. Our next-generation autonomous driving stack depends on finding the rare edge cases, long-tail scenarios, and model errors that matter most. OmniTag, our ML-powered multimodal data mining framework, is the engine that powers this discovery.

As a Senior Backend Engineer on the Data Mining team, you’ll architect and own the production systems that enable data scientists and ML engineers to rapidly mine, analyze, and extract insights from billions of data points across cameras, LiDAR, radar, and other modalities. You won’t maintain a platform; you’ll evolve its core foundation, ensuring OmniTag scales to support Motional’s most ambitious autonomy challenges. Your work directly impacts the quality and speed at which we improve our perception and planning models.

What You’ll Do :

Architect the OmniTag Engine: Design and build the high‑throughput, low‑latency backend systems that execute billion‑scale inference across Ray / Spark, transforming raw sensor data into unified multimodal representations. Optimize for both query latency and resource efficiency in a cost‑sensitive, cloud‑based environment.
Scale Multimodal Data Pipelines: Own the complete data journey—from ingestion, normalization, and preprocessing of heterogeneous modalities (image, video, LiDAR, audio) through encoding, indexing, and cached embedding storage. Ensure pipelines are robust, observable, and meet the SLOs expected by downstream ML teams.
Evolve the Vector Search and Retrieval Engine: Enhance our in‑house billion‑scale vector search engine to power RAG‑driven few‑shot dataset creation. Optimize embedding storage, retrieval performance, and filtering across billions of examples to enable rapid interactive mining workflows.
Own Data Quality and Observability: Build comprehensive monitoring, logging, and alerting for multimodal data preprocessing pipelines. Develop data validation frameworks that catch regressions in data alignment, normalization, or encoding quality—critical for maintaining model performance.
Collaborate on Encoder‑Decoder Adaptation: Work closely with ML engineers to support domain‑specific fine‑tuning workflows, model versioning, and A / B testing of new encoders and decoders. Ensure the backend infrastructure enables rapid experimentation with emerging open‑source multimodal foundation models.
Drive Production Reliability: Establish patterns for graceful degradation, fault tolerance, and cost optimization. Operate OmniTag as a mission‑critical data platform serving the entire ML organization, with a focus on reliability, debuggability, and operational excellence.

What We’re Looking For :

BS in Computer Science or a related field, or equivalent professional experience.
6+ years designing, building, and operating large‑scale distributed systems in production environments.
Deep, hands‑on expertise with Ray or Spark (or both) for distributed data processing and large‑scale inference workloads.
Expert‑level Python proficiency with strong software engineering fundamentals: testing (unit, integration, and end‑to‑end), CI / CD pipelines, containerization, and code review practices.
Proven experience optimizing and scaling production data pipelines that process terabytes or petabytes of data.
Strong SQL and data manipulation skills; comfort with both structured and semi‑structured data.
Experience with cloud infrastructure (AWS preferred: S3, EC2, EKS, EMR, IAM) and infrastructure‑as‑code patterns.
Demonstrated track record of shipping robust, well‑tested, production‑grade systems and mentoring junior engineers.

Bonus Points :

MS / PhD in Computer Science, Machine Learning, or a related field.
Experience building or scaling vector databases, large‑scale information retrieval systems, or similarity search engines.
Hands‑on work with multimodal machine learning models, foundation models (LLMs / VLMs), or embeddings‑based systems.
Familiarity with ML frameworks (PyTorch, JAX) and the ecosystem around multimodal models.
Production experience with workflow orchestration (Airflow, Kubeflow, Dagster) and stream processing (Kafka, Flink).
Understanding of model serving patterns, feature stores, or ML‑ops infrastructure.
Domain knowledge in autonomous driving, computer vision, or sensor fusion.
Experience with ML‑based data mining, active learning, or contrastive learning approaches.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.