National Cancer Centre of Singapore Pte Ltd
Job Category: Research
Posting Date: 11 Apr 2025
NCCS Data and Computational Science (DCS) is a newly established computational hub within National Cancer Center of Singapore (NCCS) which focuses on leveraging data analytics and computational methods to advance cancer research and treatment. DCS features high-powered computing resources capable of processing ‘big data’ profiles and running advanced interpretable machine learning algorithms and robust statistical techniques. DCS offers in-house and centralised solutions for NCCS researchers who require computational analysis without the need to buy specialised equipment or contract with third party vendors. DCS aims to maximise the efficiency of data processes and accelerate research outcomes. With access to national level medical data spanning clinical, imaging and omics datasets, our efforts are concentrated on harvesting the innate value of these rich datasets to improve cancer patient care and treatment delivery through the production of world-class research.
The Data Scientist will provide support and insights for DCS core multi-omics research and big data analysis computing infrastructure that focuses on using next-generation sequencing (NGS), radiological imaging and other multi-modal datatypes to develop biomarkers predictive of clinical responses in cancer patients. The Data Scientist will be expected to wrangle and manage large datasets, execute computational pipelines and interpret data to better understand the complexity of cancer progression and treatment resistance across multiple cancer types, as well as design new pipelines and/or optimise existing ones to meet evolving research needs.
Key Responsibilities:
- Drive and manage multiple projects: plan and perform sequencing data analyses, statistical analyses, and other relevant computational analyses and interpret the results in the context of assigned research projects/questions.
- Collaboration: Work closely with the Principal Investigator, clinicians, data scientists, researchers and stakeholders.
- Data pipeline management: Design, implement and maintain data processing pipelines for ingesting, transforming, and loading data from various sources.
- Security and Compliance: Define computing resources and data access controls, encryption, and authentication mechanisms. Ensure compliance with data privacy regulations (e.g., GDPR, PDPA, APAC Data Laws etc.) and organisation structures.
- IT Infrastructure Maintenance: Monitor system performance, identify and resolve bottlenecks or issues, ensuring minimal downtime. Apply software updates and patches. Backup data to prevent data loss. Source and liaise with vendors in procurement of the IT infrastructure to support the team's expansion needs as necessary.
Job Requirements:
- Master/PhD in Computational Biology or Computer Science or Mathematics or Biostatistics or Physics preferred.
- At least 3 years research experience in a Genomics-related field is mandatory.
- Knowledge and experience in omics analyses is mandatory (any of WGS, WES, RNAseq, single-cell sequencing or similar).
- Knowledge and experience in biostatistical analyses of clinical datasets preferred.
- Familiarity with data security and access control measures preferred.
- Ability to plan and execute data analysis and ad-hoc projects, both independently and in collaboration with external parties.
- Strong organisational, interpersonal and presentation skills.
- Familiarity with Linux or other Unix flavours, preferably as an administrator/superuser and server maintenance preferred.
- Familiarity with pipeline management systems (e.g. Nextflow), job schedulers (e.g. Slurm), container/virtualization systems (Docker, Singularity).