Enable job alerts via email!
A research institution in the UK seeks a Principal Research Data Scientist to advance biological research through machine learning. You will lead projects, collaborate with interdisciplinary teams, and manage large datasets. Ideal candidates should have strong Python skills and experience in ML model development. Opportunities to publish and present at conferences available.
Social network you want to login/join with:
Do you want to help us improve human health and understand life on Earth? Make your mark by shaping the future to enable or deliver life-changing science to solve some of humanity’s greatest challenges.
We seek a Principal Machine Learning Research Data Scientist to join a collaborative project between the Wellcome Sanger Institute and Open Targets. This project aims to leverage datasets generated at the Sanger Institute and publicly available data from human cells to create foundational models for biology, enhancing our understanding of life's rules and improving health for all. You will work within an interdisciplinary team of life scientists and computer/ML scientists, with a shared objective of advancing biological research through these models. This role will sit within the AI/ML Faculty group led by Dr. Mohammad Lotfollahi, and the successful candidates, across different seniority levels, will be responsible for delivering their scientific research projects as part of the broader team strategy.
About the Role
Your role will involve designing foundational models leveraging multi-modal readouts, integrating and processing data from various sources to develop robust AI models. You will work with open-source software, proposing, developing, and maintaining solutions to analyze large-scale single-cell datasets. We have access to unique data and the capacity to generate data for training models, supported by substantial computational power and GPU resources.
Our teams are experienced in generating and analyzing datasets, including millions of cells across tissues and conditions (e.g., disease, healthy). This requires a detailed understanding of training large-scale ML models and a track record of large data-science projects.
You will be responsible for:
About You:
You will be supported in your development and have opportunities to lead publications and present at conferences on genetics and genomics in drug discovery.
● Ph.D. or M.Sc. with relevant research experience in fields like Computer Science, Computational Biology, Genetics, Bioinformatics, Physics, Engineering, or Applied Mathematics
● Previous ML research experience in academic or scientific environments (including RA/Internships)
● Strong Python skills, including libraries like Scikit-Learn, SciPy, TensorFlow, and PyTorch
● Experience in designing, training, and deploying ML models
● Handling large datasets with techniques like data cleaning, feature engineering, and augmentation
● Experience with high-performance computing environments and GPUs
● Knowledge of NLP and transformer models like BERT and GPT
● Familiarity with generative models such as diffusion and flow matching
● Good software development practices and collaboration tools (git, Python packaging, code reviews)
● Strong analytical and problem-solving skills
● Excellent communication skills for explaining complex ML concepts to non-technical stakeholders
In addition, you should demonstrate:
Relevant publications from the group include: