Enable job alerts via email!
Boost your interview chances
Create a job specific, tailored resume for higher success rate.
A leading research institution seeks a Principal Machine Learning Research Data Scientist to join a collaborative project aimed at leveraging datasets to enhance biological research. The role involves designing foundational models, managing ML projects, and contributing to publications in high-impact scientific journals, all within a supportive and innovative environment.
Social network you want to login/join with:
Do you want to help us improve human health and understand life on Earth? Make your mark by shaping the future to enable or deliver life-changing science to solve some of humanity’s greatest challenges.
We seek a Principal Machine Learning Research Data Scientist to join a collaborative project between the Wellcome Sanger Institute and Open Targets. This project aims to leverage datasets generated at the Sanger Institute and publicly available data from human cells to create foundational models for biology, enhancing our understanding of life's rules and improving health for all. You will work within an interdisciplinary team of life scientists and computer/ML scientists, with a shared goal of advancing biological research through these models. This role will be within the AI/ML Faculty group led by Dr. Mohammad Lotfollahi, and successful candidates across different seniority levels will be responsible for delivering scientific research projects aligned with the team strategy.
About the Role
Your responsibilities will include designing foundational models leveraging multi-modal readouts, integrating data from various sources, and developing robust AI models. You will work with open-source software, proposing, developing, and maintaining solutions to analyze large-scale single-cell datasets. We have access to unique data and the capacity to generate data for model training, supported by substantial computational and GPU resources.
Our teams have experience generating and analyzing datasets comprising millions of cells across tissues and conditions, including disease and healthy states. This involves understanding the training of large-scale ML models and managing large data science projects.
You will be responsible for:
About You
You will be supported in your professional development and have opportunities to lead publications and present at conferences on genetics and genomics in drug discovery.
● Ph.D. or M.Sc. with relevant research experience in fields such as Computer Science, Computational Biology, Genetics, Bioinformatics, Physics, Engineering, or Applied Statistics/Mathematics.
● Prior ML research experience in academic or scientific environments (internships/RA considered).
● Strong Python skills, including libraries like Scikit-Learn, SciPy, TensorFlow, and PyTorch.
● Expertise in designing, training, and deploying ML models.
● Experience handling large datasets, data cleaning, feature engineering, and augmentation.
● Experience with high-performance computing and GPU training.
● Knowledge of NLP and transformer models like BERT and GPT.
● Familiarity with generative models such as diffusion and flow matching.
● Good software development practices, including version control and package management.
● Strong problem-solving skills and the ability to communicate complex concepts effectively.
● Evidence of related research work in ML.
Additional skills include:
Relevant publications from the group include: