Enable job alerts via email!

Principal Data Scientist

Wellcome Sanger Institute

Hinxton

On-site

GBP 60,000 - 80,000

Full time

30+ days ago

Job summary

A research institution in the UK seeks a Principal Research Data Scientist to advance biological research through machine learning. You will lead projects, collaborate with interdisciplinary teams, and manage large datasets. Ideal candidates should have strong Python skills and experience in ML model development. Opportunities to publish and present at conferences available.

Qualifications

  • Ph.D. or M.Sc. with relevant research experience in fields like Computer Science, Computational Biology, Genetics, etc.
  • Strong experience in machine learning research.
  • Ability to handle large datasets.

Responsibilities

  • Managing and leading machine learning research projects.
  • Collaborating to evaluate machine learning models.
  • Supervising and training Ph.D. students and postdocs.

Skills

Python
Machine Learning
Data Analysis
Scikit-Learn
TensorFlow
PyTorch
NLP
Data Cleaning
Feature Engineering
Generative Models

Education

Ph.D. or M.Sc. in relevant field

Tools

High-performance computing
Git
Job description

Social network you want to login/join with:

Do you want to help us improve human health and understand life on Earth? Make your mark by shaping the future to enable or deliver life-changing science to solve some of humanity’s greatest challenges.

Principal Research Data Scientist

We seek a Principal Machine Learning Research Data Scientist to join a collaborative project between the Wellcome Sanger Institute and Open Targets. This project aims to leverage datasets generated at the Sanger Institute and publicly available data from human cells to create foundational models for biology, enhancing our understanding of life's rules and improving health for all. You will work within an interdisciplinary team of life scientists and computer/ML scientists, with a shared objective of advancing biological research through these models. This role will sit within the AI/ML Faculty group led by Dr. Mohammad Lotfollahi, and the successful candidates, across different seniority levels, will be responsible for delivering their scientific research projects as part of the broader team strategy.

About the Role

Your role will involve designing foundational models leveraging multi-modal readouts, integrating and processing data from various sources to develop robust AI models. You will work with open-source software, proposing, developing, and maintaining solutions to analyze large-scale single-cell datasets. We have access to unique data and the capacity to generate data for training models, supported by substantial computational power and GPU resources.

Our teams are experienced in generating and analyzing datasets, including millions of cells across tissues and conditions (e.g., disease, healthy). This requires a detailed understanding of training large-scale ML models and a track record of large data-science projects.

You will be responsible for:

  • Managing and leading machine learning research projects and publishing results in scientific journals or conferences (ICLR, ICML, CVPR, etc.)
  • Collaborating with team members to propose, develop, and evaluate machine learning models for understanding single-cell data and drug discovery applications
  • Supervising and training Ph.D. students and postdocs in interdisciplinary scientific problems in biology
  • Writing scientific papers on biotechnology and biology
  • Distilling solutions into open-source, user-friendly packages with documentation for biologists and bioinformaticians
  • Presenting research and pipelines to internal and external audiences

About You:

You will be supported in your development and have opportunities to lead publications and present at conferences on genetics and genomics in drug discovery.

● Ph.D. or M.Sc. with relevant research experience in fields like Computer Science, Computational Biology, Genetics, Bioinformatics, Physics, Engineering, or Applied Mathematics

● Previous ML research experience in academic or scientific environments (including RA/Internships)

● Strong Python skills, including libraries like Scikit-Learn, SciPy, TensorFlow, and PyTorch

● Experience in designing, training, and deploying ML models

● Handling large datasets with techniques like data cleaning, feature engineering, and augmentation

● Experience with high-performance computing environments and GPUs

● Knowledge of NLP and transformer models like BERT and GPT

● Familiarity with generative models such as diffusion and flow matching

● Good software development practices and collaboration tools (git, Python packaging, code reviews)

● Strong analytical and problem-solving skills

● Excellent communication skills for explaining complex ML concepts to non-technical stakeholders

  • Evidence of research experience in machine learning
  • In addition, you should demonstrate:

    • Ability to understand complex scientific and technical challenges and break them down into actionable steps
    • Flexibility to adapt in a changing environment
    • Effective workload management and timely delivery
    • Networking, influencing, and relationship-building skills
    • Strategic thinking and seeing the bigger picture
    • Ability to build collaborative relationships at all levels
    • Respect and inclusivity

    Relevant publications from the group include:

    • Lotfollahi, M. et al. Mapping single-cell data to reference atlases by transfer learning. Nature Biotechnology.
    • Lotfollahi, M. et al. scGen predicts single-cell perturbation responses. Nature Methods.
    • Lotfollahi, M. et al. Biologically informed deep learning to query gene programs in single cell atlases. Nature Cell Biology.
    Get your free, confidential resume review.
    or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.