Enable job alerts via email!

Data Engineer (Senior)

BETTERDATA PTE. LTD.

Singapore

Hybrid

SGD 90,000 - 120,000

Full time

Yesterday

Be an early applicant

Job summary

A data-driven technology company in Singapore is seeking a Senior Data Engineer to build and maintain data infrastructure for scalable solutions. The ideal candidate has strong experience in scaling data and machine learning systems. Key responsibilities include architecting data pipelines, ensuring data quality, and collaborating with multiple teams. This role offers flexible work arrangements and equity eligibility.

Benefits

Flexible time-off arrangements

Work from office or WFH on some days

Competitive equity packages

Qualifications

3+ years of experience in building scalable data solutions.
Expertise in automated data quality frameworks.
Hands-on experience with web scraping tools.

Responsibilities

Build data ingestion pipelines from enterprise relational databases.
Design scalable data pipelines for batch processing.
Implement monitoring and alerting for pipeline health.

Skills

Scaling data pipelines

Machine learning systems

ETL/ELT pipelines

Web scraping

Education

Bachelor's degree in Computer Science or related field

Tools

Python

Pandas

Spark

Airflow

Who Are We Looking For:

We are seeking a experienced Data Engineer (Senior) to build and maintain data infrastructure to convert our research into scalable, production-ready solutions for synthetic tabular data generation. You will also architect and operate our large-scale data curation, scraping, and cleaning pipelines to deliver massive amounts of datasets for pretraining and finetuning large language models on tabular and unstructured domains.

This is an individual contributor (IC) role suited for someone who thrives in a fast-paced, early-stage start-up environment. The ideal candidate has experience scaling data and machine learning systems to handle datasets with billions of records and can build and optimize complex data pipelines for enterprise applications. You'll work closely with software, machine learning and applied research teams to optimize performance and ensure seamless integration of systems, handling data from financial institutions, government agencies, consumer brands and more.

Key Responsibilities:

Data Infrastructure and Pipeline Development:

Build data ingestion pipelines from enterprise relational databases (e.g. Oracle, SQL Server, PostgreSQL, MySQL, Databricks, Snowflake, BigQuery) and files (e.g. Parquet, CSV) for large-scale synthetic data pipelines.
Design scalable data pipelines for batch processing.
Architect and maintain data warehouses and data lakes (e.g. Delta Lake) optimized for synthetic data training and generation workflows.
Seamlessly transform Pandas-based research code into production-ready pipelines.
Build automated data quality monitoring and validation systems to ensure data integrity throughout the pipeline lifecycle.
Implement comprehensive data lineage tracking and audit capabilities for regulatory compliance and privacy validation.
Design robust error handling mechanisms, with automatic retries and data recovery in case of pipeline failures.
Track performance metrics such as data throughput, latency, and processing times to ensure efficient pipeline operations at scale.
Implement monitoring and alerting (e.g. Prometheus, Grafana) for pipeline health, throughput, and data quality metrics.
Optimize resource allocation and cost efficiency for distributed processing at terabytes to petabyte scale.

Massive-Scale Data Collection & Ingestion:

Design and build distributed web scraping clusters to extract data from millions of pages.
Build LLM-aided data filtering systems combining automated model scoring to evaluate and prioritize high-quality content.

Understanding of ML concepts and algorithms:

Fair understanding of machine learning concepts, training workflows and algorithms, with familiarity in tools like PyTorch and Hugging Face.

Documentation & Reporting:

Create clear documentation of data pipelines, workflows, and system architectures to enable smooth handovers and collaboration across teams.

Qualifications:

Bachelor's degree in Computer Science, Software Engineering, Data Engineering, or related field with strong foundation in distributed systems and data processing
Expert proficiency at scaling data pipelines and machine learning systems to handle billions of rows in enterprise environments.
3+ years of experience in building scalable data solutions with Python and distinct libraries such as:Data Science Libraries: Pandas, NumPy, Scikit-learn.
Deep Learning Libraries: Pytorch
Scaling Libraries: Spark, Dask, etc
Orchestration tools: Airflow, Dagster, etc
Data validation: Pandera, Pydantic, etc
Expertise in automated data quality frameworks including rule-based and AI-based automation for format validation, anomaly detection, statistical validation.
Proficiency in building ETL/ELT pipelines and managing data across relational databases (e.g. PostgreSQL, Oracle Database, SQL Server, MySQL), data lakes (e.g. Delta Lake) and cloud storage.
Experience in building data monitoring and alerting systems.
Hands-on experience with web scraping tools (Scrapy, Selenium, Puppeteer).
Experience building ML data pipelines and supporting infrastructure for training and deploying machine learning models at scale.

Good to Have:

Experience with data governance frameworks and compliance requirements (GDPR, CCPA, PDPA) in data processing systems.
Experience with containerization and orchestration using Docker, Kubernetes, and cloud-native deployment strategies.
Strong knowledge of cloud platforms (AWS, GCP, Azure) and their data services (S3, BigQuery, Data Lake Storage, etc).

Why Join Us:

This is a unique opportunity for someone looking to actively build and scale systems in a fast-moving start-up. If you’ve successfully scaled machine learning and data systems to billions of rows and thrive in a dynamic, hands-on environment, this role is for you.

Benefits:

Flexible time-off arrangements
Flexible work arrangements - work from office at One North or WFH on some days
Equity eligibility: Competitive equity packages, with grant size evaluated based on the candidate’s experience, skills, and impact.

How to apply:

Does this role sound like a good fit to you?

We see this first: Submit your application here
We see this last: If the above does not work, you may email us your CV (pdf format) at jobs@betterdata.ai.Include the title of the role in your subject
Indicate your available start - end dates (DDMMYY - DDMMYY)
Send along links/supporting information that best showcase the relevant things you have built and done

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Data Engineer (Senior)

BETTERDATA PTE. LTD.

Singapore

Hybrid

SGD 90,000 - 120,000

Full time

Job summary

Benefits

Qualifications

Responsibilities

Skills

Education

Tools

Job description

Company

Services

Free resources

Support

Data Engineer (Senior)

BETTERDATA PTE. LTD.

Singapore

Hybrid

SGD 90,000 - 120,000

Full time

Job summary

Benefits

Qualifications

Responsibilities

Skills

Education

Tools

Job description

Follow us

Company

Services

Free resources

Support