Enable job alerts via email!
The Wellcome Sanger Institute seeks a Senior Data Scientist to lead machine learning projects aimed at enhancing the understanding of biological data. Successful candidates will join an interdisciplinary team and contribute to innovative research in genomics and drug discovery, utilizing extensive datasets and computational resources.
Social network you want to login/join with:
Do you want to help us improve human health and understand life on Earth? Make your mark by shaping the future to enable or deliver life-changing science to solve some of humanity’s greatest challenges.
We seek a senior machine learning research scientist to join a collaborative project between the Wellcome Sanger Institute and Open Targets (targets ( This project aims to leverage datasets internally generated at the Sanger Institute and publicly available data from human cells to create foundational models for biology, enhancing our understanding of life's rules and improving health for all. You will work within an interdisciplinary team of life scientists and computer/ML scientists, with a shared objective of advancing biological research through these foundational models. This role will sit within the AI/ML Faculty group led by Dr. Mohammad Lotfollahi, and the successful candidates, across different seniority levels (senior and principal), will be responsible for delivering their portfolio of scientific research projects as part of the broader team strategy.
About the role
Your role will involve designing foundational models leveraging multi-modal readouts. This includes integrating and processing data from various sources to develop robust and versatile AI models. To achieve this, you will work with open-source software, proposing, developing, and maintaining new solutions to analyze and interpret large-scale single-cell datasets. We have access to unique data and are also in the position to generate data to train unique models. Additionally, we have substantial computational power and GPU resources to train large models efficiently.
Our teams are well-positioned to tackle this problem with experience in both generating and analyzing datasets, including millions of cells across multiple tissues and conditions (e.g., disease, healthy). This involves a detailed understanding of the training of large-scale ML models and a track record of undertaking large data-science projects.
You will be responsible for:
About You:
You will be supported in your personal and professional development and have the opportunity to lead peer-reviewed publications around using genetics and genomics approaches to guide drug discovery and present them at national and international conferences.
● Ph.D. or M.Sc. with equivalent research experience in a relevant quantitative discipline (e.g., Computer Science, Computational Biology, Genetics, Bioinformatics, Physics, Engineering, or Applied Statistics/Mathematics)
● Previous ML work experience in scientific/academic environment (RA/Internships are considered as work experience)
● Strong knowledge of Python, including core data science libraries such as Scikit-Learn, SciPy, TensorFlow, and PyTorch.
● Expertise in machine learning algorithms and frameworks, with experience in designing, training, and deploying ML models.
● Proficiency in handling and processing large datasets, including techniques for data cleaning, feature engineering, and data augmentation.
● Experience with high-performance computing environments, including the use of GPUs for training large-scale machine learning models.
● Experience in natural language processing (NLP) and training models based on transformer architectures, such as BERT and GPT.
● Familiarity with generative models such as diffusion models and flow matching.
● Knowledge of software development good practices and collaboration tools, including git-based version control, Python package management, and code reviews.
● Strong problem-solving skills with the ability to analyze complex data and derive actionable insights.
● Excellent communication skills, with the ability to explain complex machine learning algorithms and statistical methods to non-technical stakeholders.
In addition to the above technical skills, you will also have the following:
Relevant publication of the groups: