Enable job alerts via email!

AI-Driven Big Data Engineer (PhD Required)

Pixalate, Inc

Singapore

Remote

SGD 90,000 - 130,000

Full time

10 days ago

Job summary

A leading technology company is seeking an AI-Driven Big Data Engineer to work remotely from Singapore. The role focuses on developing intelligent, self-healing data systems and implementing innovative AI solutions. Candidates must hold a PhD in Computer Science or related fields and have experience with distributed systems and large datasets. This position offers unparalleled opportunities to leverage cutting-edge AI research in practical applications.

Qualifications

  • PhD in a relevant field or exceptional Master's with research experience.
  • Published research in distributed computing or ML infrastructure.
  • Experience with large datasets and lakehouse architectures.

Responsibilities

  • Design autonomous pipelines for data optimization.
  • Implement ML-driven anomaly detection for large datasets.
  • Develop real-time feature stores for transactions.

Skills

Expert SQL
Python
Scala/Java
Spark
Kafka
MLflow
KerasTuner

Education

PhD in Computer Science, Data Science, or Distributed Systems

Tools

BigQuery
Dataflow
Databricks
Job description
AI- Driven Big Data Engineer
Employment Type: Full-Time
Location: Remote, Singapore
Level: Entry to Mid Level (PhD Required)
Bridge Cutting-Edge AI Research with Petabyte-Scale Data Systems

Pixalate is an online trust and safety platform that protects businesses, consumers and children from deceptive, fraudulent and non-compliant mobile, CTV apps and websites. We're seeking a PhD-level Big Data Engineer to revolutionize how AI transforms massive-scale data operations.

Our impact is real and measurable. Our software has uncovered:

About the Role

Work at the intersection of big data and AI, where you'll develop intelligent, self-healing data systems processing trillions of data points daily. You'll have autonomy to pursue research in distributed ML systems and AI-enhanced data optimization, with your innovations deployed at unprecedented scale within months, not years.

This isn't traditional data engineering - you'll implement agentic AI for autonomous pipeline management, leverage LLMs for data quality assurance, and create ML-optimized architectures that redefine what's possible at petabyte scale.

Key Research Areas & Responsibilities
AI-Enhanced Data Infrastructure
  • Design intelligent pipelines with autonomous optimization and self-healing capabilities using agentic AI
  • Implement ML-driven anomaly detection for terabyte-scale datasets
Distributed Machine Learning at Scale
  • Build distributed ML pipelines
  • Develop real-time feature stores for billions of transactions
  • Optimize feature engineering with AutoML and neural architecture search
Required Qualifications
Education & Research
  • PhD in Computer Science, Data Science, or Distributed Systems (exceptional Master's with research experience considered)
  • Published research or expertise in distributed computing, ML infrastructure, or stream processing
Technical Expertise
  • Core Languages: Expert SQL (window functions, CTEs), Python (Pandas, Polars, PyArrow), Scala/Java
  • Big Data Stack: Spark 3.5+, Flink, Kafka, Ray, Dask
  • Storage & Orchestration: Delta Lake, Iceberg, Airflow, Dagster, Temporal
  • Cloud Platforms: GCP (BigQuery, Dataflow, Vertex AI), AWS (EMR, SageMaker), Azure (Databricks)
  • ML Systems: MLflow, Kubeflow, Feature Stores, Vector Databases, scikit-learn + search CV, H2O AutoML, auto-sklearn, GCP Vertex AI AutoML Tables
  • Neural Architecture Search: KerasTuner, AutoKeras, Ray Tune, Optuna, PyTorch Lightning + Hydra
Research Skills
  • Track record with 100TB+ datasets
  • Experience with lakehouse architectures, streaming ML, and graph processing at scale
  • Understanding of distributed systems theory and ML algorithm implementation
Preferred Qualifications
  • Experience applying LLMs to data engineering challenges
  • Ability to translate complex AutoML/NAS research into practical production workflows
  • Hands-on project examples of feature engineering automation or NAS experiments
  • Proven success in automating ML pipelines, from raw data to an optimized model architecture
  • Contributions to Apache projects (Spark, Flink, Kafka)
  • Knowledge of privacy-preserving techniques and data mesh architectures
What Makes This Role Unique

You'll work with one of the few truly petabyte-scale production datasets outside of major tech companies, with the freedom to experiment with cutting-edge approaches. Unlike traditional big data roles, you'll apply the latest AI research to fundamental data challenges - from using LLMs to understand data quality issues to implementing agentic systems that autonomously optimize and heal data pipelines.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.