About EMBOEMBO is a not-for-profit organization dedicated to promoting excellence in the life sciences in Europe and beyond. EMBO currently comprises a community of more than 2,100 EMBO Members. We fund talented researchers at all career stages, facilitate scientific exchanges through high-quality academic publishing, conferences, and lectures, and foster a research environment where scientists can achieve their best work.
Your roleWe seek a trainee with
expertise in data science, machine learning and data visualization to develop AI-assisted tools to establish vector-based representation of the expertise of researchers and analyze them at scale in the context of the global scientific landscape. The trainee will work with developers from the Open Science Implementation team and in collaboration with the Membership & Elections team to deliver focused analyses. They will report to the Head of Open Science Implementation and the Head of Membership and Elections.
We want to derive
data-driven insights into the composition of our membership and other relevant scientific communities (authors, reviewers, conference speakers, grant applicants etc.) in terms of their areas of expertise as compared to the global
life science landscape. To this end, we are developing AI-assisted tools to represent the scientific expertise of scientists, journals and conferences. The tools currently in development use
NLP techniques, including tailored large language model embeddings, automated keyword and concept extraction, and graph-based data analysis. The end goal of the project is sharing the results of the analyses with our community through advanced visualization methods.
You have- Experience in data sciences or machine learning, or you have or are about to have a degree or certified training in a closely related field in computational sciences;
- Experience in structured, object-oriented programming in Python;
- Experience with multi-dimensional data processing, analytics and visualization (for example, NumPy, SciPy, scikit-learn, ...);
- Knowledge in text processing (spacy, nltk) and some of the major transformer-based frameworks (HuggingFace, LangChain, LlamaIndex, OpenAI or Anthropic APIs);
- The ability to work both autonomously and as part of a team.
You may also have- Contributed to open source projects;
- Experience with vector stores;
- Experience in training or fine-tuning models with a major machine learning framework (PyTorch, TensorFlow);
- Prior exposure to life sciences, computational biology or bioinformatics.
Contract length- 6 months, renewable to 12 months total
Stipend- 1500 euros / month