Enable job alerts via email!

Senior Data Engineer

StarHub Ltd

Singapore

On-site

SGD 70,000 - 90,000

Full time

Yesterday

Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading telecommunications company in Singapore seeks an experienced Data Engineer to design and deploy AI-powered, cloud-based solutions. The ideal candidate will work with large datasets, build scalable data pipelines using technologies like PySpark and AWS, and collaborate with cross-functional teams to ensure data quality and performance optimization. A Bachelor's or Master's degree in a relevant field and 4+ years of experience are essential for this role.

Qualifications

4+ years in data engineering, analytics, or related AI/ML role.
Experience with ETL/ELT workflows and automation.
Familiar with data modeling, indexing, and schema evolution.

Responsibilities

Design, develop, and deploy AI-powered, cloud-based products.
Ensure data quality, build scalable pipelines, and optimize performance.
Collaborate with data scientists and other stakeholders.

Skills

Python for ETL/data engineering

Spark (PySpark)

Big Data frameworks

SQL engines (Spark SQL, Redshift, PostgreSQL)

Airflow

GitLab CI/CD or Jenkins

Relational and NoSQL databases

Infrastructure as Code (Terraform, CloudFormation)

MLOps/LLMOps

Problem-solving skills

Education

Bachelor’s or Master’s in Computer Science, Software Engineering, Data Science

Tools

AWS (S3, EMR, Redshift)

Docker

Kubernetes

Overview

You will design, develop, and deploy AI-powered, cloud-based products. As a Data Engineer, you’ll work with large-scale, heterogeneous datasets and hybrid cloud architectures to support analytics and AI solutions. Collaborate with data scientists, infra engineers, sales specialists, and stakeholders to ensure data quality, build scalable pipelines, and optimize performance. Your work will integrate telco data with other verticals (retail, healthcare), automate DataOps/MLOps/LLMOps workflows, and deliver production-grade systems.

Responsibilities

Ensure Data Quality & Consistency
- Validate, clean, and standardize data (e.g., geolocation attributes) to maintain integrity.
- Define and implement data quality metrics (completeness, uniqueness, accuracy) with automated checks and reporting.
Build & Maintain Data Pipelines
- Develop ETL/ELT workflows (PySpark, Airflow) to ingest, transform, and load data into warehouses (S3, Postgres, Redshift, MongoDB).
- Automate DataOps/MLOps/LLMOps pipelines with CI/CD (Airflow, GitLab CI/CD, Jenkins), including model training, deployment, and monitoring.
Design Data Models & Schemas
- Translate requirements into normalized/denormalized structures, star/snowflake schemas, or data vaults.
- Optimize storage (tables, indexes, partitions, materialized views, columnar encodings) and tune queries (sort/distribution keys, vacuum).
Integrate & Enrich Telco Data
- Map 4G/5G infrastructure metadata to geospatial context, augment 5G metrics with legacy 4G, and create unified time-series datasets.
- Consume analytics/ML endpoints and real-time streams (Kafka, Kinesis), designing aggregated-data APIs with proper versioning (Swagger/OpenAPI).
Manage Cloud Infrastructure
- Provision and configure resources (AWS S3, EMR, Redshift, RDS) using IaC (Terraform, CloudFormation), ensuring security (IAM, VPC, encryption).
- Monitor performance (CloudWatch, Prometheus, Grafana), define SLAs for data freshness and system uptime, and automate backups/DR processes.
Collaborate Cross-Functionally & Document
- Clarify objectives with data owners, data scientists, and stakeholders; partner with infra and security teams to maintain compliance (PDPA, GDPR).
- Document schemas, ETL procedures, and runbooks; enforce version control and mentor junior engineers on best practices.

Qualifications

Bachelor’s or Master’s in Computer Science, Software Engineering, Data Science, or equivalent experience
4+ years in data engineering, analytics, or related AI/ML role
Proficient in Python for ETL/data engineering and Spark (PySpark) for large-scale pipelines
Experience with Big Data frameworks and SQL engines (Spark SQL, Redshift, PostgreSQL) for data marts and analytics
Hands-on with Airflow (or equivalent) to orchestrate ETL workflows and GitLab CI/CD or Jenkins for pipeline automation
Familiar with relational (PostgreSQL, Redshift) and NoSQL (MongoDB) stores: data modeling, indexing, partitioning, and schema evolution
Proven ability to implement scalable storage solutions: tables, indexes, partitions, materialized views, columnar encodings
Skilled in query optimization: execution plans, sort/distribution keys, vacuum maintenance, and cost-optimization strategies (cluster resizing, Spectrum)
Experience with cloud platforms (AWS): S3/EMR/Glue, Redshift and containerization (Docker, Kubernetes)
Infrastructure as Code using Terraform or CloudFormation for provisioning and drift detection
Knowledge of MLOps/LLMOps: auto-scaling ML systems, model registry management, and CI/CD for model deployment
Strong problem-solving, attention to detail, and the ability to collaborate with cross-functional teams

Nice to Have

Exposure to serverless architectures (AWS Lambda) for event-driven pipelines
Familiarity with vector databases, data mesh, or lakehouse architectures
Experience using BI/visualization tools (Tableau, QuickSight, Grafana) for data quality dashboards
Hands-on with data quality frameworks (Deequ) or LLM-based data applications (NL-->SQL generation)
Participation in GenAI POCs (RAG pipelines, Agentic AI demos, geomobility analytics)
Client-facing or stakeholder-management experience in data-driven/AI projects

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.