Enable job alerts via email!

Big Data Engineer (Cancer Science Institute)

NATIONAL UNIVERSITY OF SINGAPORE

Singapore

On-site

SGD 80,000 - 100,000

Full time

Yesterday

Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading educational institution in Singapore is seeking a skilled Big Data Engineer to join their Cancer Science Institute. The role involves architecting data automation, managing AWS infrastructures, and ensuring data compliance for extensive cancer research projects. Candidates should have a Bachelor’s degree in Computer Science and at least 2 years of experience in Data Engineering. Proficiency in Python and Infrastructure as Code tools is critical, along with experience in cloud environments. The position offers a dynamic team atmosphere focused on cutting-edge data analytics.

Qualifications

2+ years of experience in Data Engineering, Backend Development, or DevOps.
Demonstrable experience working with commercial cloud infrastructure (AWS preferred).
Experience with workflow managers like Nextflow or container orchestration via Kubernetes.

Responsibilities

Architect and maintain robust automation for ingesting raw data.
Manage and deploy AWS resources using AWS CloudFormation.
Implement the technical controls for data governance.
Manage the lifecycle of data moving between on-premise HPC and AWS S3.
Work closely with the Senior HPC Engineer for data processing pipelines.
Maintain SQL and NoSQL databases for file metadata.

Skills

Python (data tooling, automation scripts)

Infrastructure as Code (IaC)

SQL (PostgreSQL/Aurora)

AWS cloud infrastructure

Linux/Unix environments

Education

Bachelor’s degree in Computer Science

Tools

AWS CloudFormation

Terraform

Interested applicants are invited to apply directly at the NUSCareer Portal

Your application will be processed only if you apply via NUSCareer Portal

We regret that only shortlisted candidates will be notified.

Job Description

The Cancer Science Institute of Singapore – a part of National University of Singapore – is seeking a skilled Big Data Engineer to join the Genomics and Data Analytics Core (GeDaC). We are operating a petabyte-scale \"Data Nexus\" that serves as the foundation for a production AI Factory in cancer and human disease research.

You do not need a background in biology. We are looking for a pure engineer who understands data logistics, infrastructure, and scale.

The Team & Leadership

You will join a highly specialized technical team comprising an experienced Cloud/HPC Architect, an agile Full-Stack Developer, and a senior IT Manager. Crucially, you will report to a Facility Head with deep, hands‑on expertise in petabyte‑scale data‑intensive computing and DataOps. This ensures you will work in an environment where technical complexity is understood, architectural decisions are respected, and job scope is managed with engineering reality in mind.

Key Responsibilities

Data Ingestion & Logistics: Architect and maintain robust automation for ingesting raw data from sequencing instruments to our hybrid storage systems. You will own the \"handshakes\" that ensure data moves reliably from edge to cloud.
Infrastructure as Code (IaC): Manage and deploy AWS resources (S3, Lambda, DynamoDB, RDS) using AWS CloudFormation, ensuring our infrastructure is reproducible, version‑controlled, and follows DevSecOps best practices.
Technical Compliance & Provenance: Implement the technical controls for data governance. This includes designing immutable audit logs, automated access control policies, and lineage tracking systems to satisfy regulatory requirements (no manual report writing required).
Hybrid Cloud Synchronization: Manage the lifecycle of data moving between on‑premise HPC and AWS S3 Intelligent‑Tiering/Glacier to balance high‑performance availability with long‑term cost optimization.
Pipeline Integration: Work closely with the Senior HPC Engineer to ensure data is correctly staged for Nextflow/Kubernetes processing pipelines, and capture the outputs back into the data lake/warehouse.
Database Management: Maintain the SQL and NoSQL databases that serve as the \"source of truth\" for file metadata, ensuring the Full‑Stack team has low‑latency API access to query file status.

Requirements

Education: Bachelor’s degree in Computer Science, Information Systems, Engineering, or a related field.

Experience:

2+ years of experience in Data Engineering, Backend Development, or DevOps.
Demonstrable experience working with commercial cloud infrastructure (AWS preferred)

Technical Skills:

Core Logic: Strong proficiency in Python (data tooling, automation scripts).
Infrastructure: Experience with Infrastructure as Code (IaC) tools such as AWS CloudFormation and/or Terraform is essential.
Data Management: Proficiency with SQL (PostgreSQL/Aurora) and object storage (S3).
Environment: Beyond comfortable working in Linux/Unix environments.

Attributes:

Meticulous: You care deeply about data integrity. A missing file or a broken checksum bothers you.
Ownership-driven: You take responsibility for systems you build and operate.
Collaborative: You can work effectively within an established technical team, integrating your work with existing APIs and processing pipelines.

Preferred Experience

Experience with workflow managers like Nextflow or container orchestration via Kubernetes.
Experience with hybrid‑cloud data transfer tools (e.g., AWS DataSync, Storage Gateway).
Knowledge of searching/indexing tools like Elasticsearch or OpenSearch.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Top companies

Top positions