Enable job alerts via email!

Principal Data Scientist

Wellcome Sanger Institute

Hinxton

On-site

GBP 55,000 - 85,000

Full time

13 days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading research institution seeks a Principal Machine Learning Research Data Scientist to join a collaborative project aimed at leveraging datasets to enhance biological research. The role involves designing foundational models, managing ML projects, and contributing to publications in high-impact scientific journals, all within a supportive and innovative environment.

Qualifications

Ph.D. or M.Sc. with relevant research experience.
Prior ML research experience in academic or scientific environments.
Strong Python skills including relevant libraries.

Responsibilities

Managing and leading machine learning research projects.
Collaborating with team members to evaluate new ML models.
Supervising and training Ph.D. students and postdocs.

Skills

Python

Machine Learning

Data Cleaning

Feature Engineering

Problem Solving

NLP

Education

Ph.D. or M.Sc. in relevant field

Tools

Scikit-Learn

SciPy

TensorFlow

PyTorch

Social network you want to login/join with:

Do you want to help us improve human health and understand life on Earth? Make your mark by shaping the future to enable or deliver life-changing science to solve some of humanity’s greatest challenges.

Principal Research Data Scientist

We seek a Principal Machine Learning Research Data Scientist to join a collaborative project between the Wellcome Sanger Institute and Open Targets. This project aims to leverage datasets generated at the Sanger Institute and publicly available data from human cells to create foundational models for biology, enhancing our understanding of life's rules and improving health for all. You will work within an interdisciplinary team of life scientists and computer/ML scientists, with a shared goal of advancing biological research through these models. This role will be within the AI/ML Faculty group led by Dr. Mohammad Lotfollahi, and successful candidates across different seniority levels will be responsible for delivering scientific research projects aligned with the team strategy.

About the Role

Your responsibilities will include designing foundational models leveraging multi-modal readouts, integrating data from various sources, and developing robust AI models. You will work with open-source software, proposing, developing, and maintaining solutions to analyze large-scale single-cell datasets. We have access to unique data and the capacity to generate data for model training, supported by substantial computational and GPU resources.

Our teams have experience generating and analyzing datasets comprising millions of cells across tissues and conditions, including disease and healthy states. This involves understanding the training of large-scale ML models and managing large data science projects.

You will be responsible for:

Managing and leading machine learning research projects, publishing outcomes in scientific journals and conferences (ICLR, ICML, CVPR, etc.).
Collaborating with team members to propose, develop, and evaluate new ML models for understanding single-cell data and applications in drug discovery.
Supervising and training Ph.D. students and postdocs in interdisciplinary scientific problems in biology.
Contributing to scientific publications on biotechnology and biology.
Packaging developed solutions into open-source, user-friendly software with documentation for biologists and bioinformaticians.
Presenting research findings and pipelines to internal and external audiences.

About You

You will be supported in your professional development and have opportunities to lead publications and present at conferences on genetics and genomics in drug discovery.

● Ph.D. or M.Sc. with relevant research experience in fields such as Computer Science, Computational Biology, Genetics, Bioinformatics, Physics, Engineering, or Applied Statistics/Mathematics.

● Prior ML research experience in academic or scientific environments (internships/RA considered).

● Strong Python skills, including libraries like Scikit-Learn, SciPy, TensorFlow, and PyTorch.

● Expertise in designing, training, and deploying ML models.

● Experience handling large datasets, data cleaning, feature engineering, and augmentation.

● Experience with high-performance computing and GPU training.

● Knowledge of NLP and transformer models like BERT and GPT.

● Familiarity with generative models such as diffusion and flow matching.

● Good software development practices, including version control and package management.

● Strong problem-solving skills and the ability to communicate complex concepts effectively.

● Evidence of related research work in ML.

Additional skills include:

Ability to understand scientific challenges and break down complex problems.
Adaptability to changing environments and strategic thinking.
Strong networking, influencing, and relationship-building skills.
Commitment to inclusivity and respect.

Relevant publications from the group include:

Mapping single-cell data to reference atlases by transfer learning. Nature Biotechnology.
scGen predicts single-cell perturbation responses. Nature Methods.
Biologically informed deep learning to query gene programs in single cell atlases. Nature Cell Biology.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs