Enable job alerts via email!

AI-Driven Big Data Engineer (PhD Required)

Pixalate, Inc

Singapore

Remote

SGD 90,000 - 130,000

Full time

10 days ago

Job summary

A leading technology company is seeking an AI-Driven Big Data Engineer to work remotely from Singapore. The role focuses on developing intelligent, self-healing data systems and implementing innovative AI solutions. Candidates must hold a PhD in Computer Science or related fields and have experience with distributed systems and large datasets. This position offers unparalleled opportunities to leverage cutting-edge AI research in practical applications.

Qualifications

PhD in a relevant field or exceptional Master's with research experience.
Published research in distributed computing or ML infrastructure.
Experience with large datasets and lakehouse architectures.

Responsibilities

Design autonomous pipelines for data optimization.
Implement ML-driven anomaly detection for large datasets.
Develop real-time feature stores for transactions.

Skills

Expert SQL

Python

Scala/Java

Spark

Kafka

MLflow

KerasTuner

Education

PhD in Computer Science, Data Science, or Distributed Systems

Tools

BigQuery

Dataflow

Databricks

AI- Driven Big Data Engineer

Employment Type: Full-Time
Location: Remote, Singapore
Level: Entry to Mid Level (PhD Required)

Bridge Cutting-Edge AI Research with Petabyte-Scale Data Systems

Pixalate is an online trust and safety platform that protects businesses, consumers and children from deceptive, fraudulent and non-compliant mobile, CTV apps and websites. We're seeking a PhD-level Big Data Engineer to revolutionize how AI transforms massive-scale data operations.

Our impact is real and measurable. Our software has uncovered:

Gizmodo:An iCloud Feature Is Enabling a $65 Million Scam
Washington Post:Your kids' apps are spying on them
ProPublica:Porn, Piracy, Fraud: What Lurks Inside Google's Black Box Ad Empire

About the Role

Work at the intersection of big data and AI, where you'll develop intelligent, self-healing data systems processing trillions of data points daily. You'll have autonomy to pursue research in distributed ML systems and AI-enhanced data optimization, with your innovations deployed at unprecedented scale within months, not years.

This isn't traditional data engineering - you'll implement agentic AI for autonomous pipeline management, leverage LLMs for data quality assurance, and create ML-optimized architectures that redefine what's possible at petabyte scale.

Key Research Areas & Responsibilities

AI-Enhanced Data Infrastructure

Design intelligent pipelines with autonomous optimization and self-healing capabilities using agentic AI
Implement ML-driven anomaly detection for terabyte-scale datasets

Distributed Machine Learning at Scale

Build distributed ML pipelines
Develop real-time feature stores for billions of transactions
Optimize feature engineering with AutoML and neural architecture search

Required Qualifications

Education & Research

PhD in Computer Science, Data Science, or Distributed Systems (exceptional Master's with research experience considered)
Published research or expertise in distributed computing, ML infrastructure, or stream processing

Technical Expertise

Core Languages: Expert SQL (window functions, CTEs), Python (Pandas, Polars, PyArrow), Scala/Java
Big Data Stack: Spark 3.5+, Flink, Kafka, Ray, Dask
Storage & Orchestration: Delta Lake, Iceberg, Airflow, Dagster, Temporal
Cloud Platforms: GCP (BigQuery, Dataflow, Vertex AI), AWS (EMR, SageMaker), Azure (Databricks)
ML Systems: MLflow, Kubeflow, Feature Stores, Vector Databases, scikit-learn + search CV, H2O AutoML, auto-sklearn, GCP Vertex AI AutoML Tables
Neural Architecture Search: KerasTuner, AutoKeras, Ray Tune, Optuna, PyTorch Lightning + Hydra

Research Skills

Track record with 100TB+ datasets
Experience with lakehouse architectures, streaming ML, and graph processing at scale
Understanding of distributed systems theory and ML algorithm implementation

Preferred Qualifications

Experience applying LLMs to data engineering challenges
Ability to translate complex AutoML/NAS research into practical production workflows
Hands-on project examples of feature engineering automation or NAS experiments
Proven success in automating ML pipelines, from raw data to an optimized model architecture
Contributions to Apache projects (Spark, Flink, Kafka)
Knowledge of privacy-preserving techniques and data mesh architectures

What Makes This Role Unique

You'll work with one of the few truly petabyte-scale production datasets outside of major tech companies, with the freedom to experiment with cutting-edge approaches. Unlike traditional big data roles, you'll apply the latest AI research to fundamental data challenges - from using LLMs to understand data quality issues to implementing agentic systems that autonomously optimize and heal data pipelines.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

AI-Driven Big Data Engineer (PhD Required)

Pixalate, Inc

Singapore

Remote

SGD 90,000 - 130,000

Full time

Job summary

Qualifications

Responsibilities

Skills

Education

Tools

Company

Services

Free resources

Support

AI-Driven Big Data Engineer (PhD Required)

Pixalate, Inc

Singapore

Remote

SGD 90,000 - 130,000

Full time

Job summary

Qualifications

Responsibilities

Skills

Education

Tools

Follow us

Company

Services

Free resources

Support