Job Search and Career Advice Platform

Enable job alerts via email!

NLP Data Scientist / Scientific Data Engineer

European Bioinformatics Institute (EMBL-EBI)

Hartford

Hybrid

GBP 125,000 - 150,000

Full time

Yesterday
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading research institute in the UK is seeking a skilled professional to develop machine learning pipelines for drug side effects extraction. You will collaborate with pharmaceutical partners and work closely with domain experts. The position requires a PhD or equivalent experience, proficiency in Python, and knowledge of NLP methods. This role offers a competitive salary, benefits, and a flexible hybrid working arrangement, allowing a balance between on-site and remote work.

Benefits

Private medical insurance
30 days annual leave
Flexible working arrangements
Relocation package
Campus amenities including gym and library

Qualifications

  • PhD, Masters or equivalent experience in a relevant field.
  • Experience with document and text preprocessing techniques.
  • Knowledge of cheminformatics resources and bioinformatics databases.

Responsibilities

  • Develop machine learning pipelines for drug side effects extraction.
  • Propose ideas for data extraction methods and pipelines.
  • Collaborate with the Open Targets Partners to validate methods.

Skills

Experience with language models
Proficiency in Python
Attention to detail
Strong communication skills
Team-oriented collaboration

Education

PhD or Master's in computational linguistics, computer science, bioinformatics, or cheminformatics

Tools

PySpark
Pandas
Job description

The position is embedded within the Chemical Biology Services team at EMBL-EBI and the Open Targets Safety 2.0 project. You will work closely with safety scientists from Open Targets pharmaceutical partners (MSD, Genentech, GSK, Pfizer, Sanofi), ensuring delivery of workpackages and seamless integration of pipelines into ChEMBL and the Open Targets Platform.

Responsibilities
  • Develop machine learning pipelines for extracting drug side effects from drug labels, clinical trials, publications and other documents
  • Investigate modern NLP methodologies and propose ideas for the implementation of data extraction methods and pipelines
  • Apply language models to extract and map drug-related information from unstructured text, e.g. from the scientific literature, ClinicalTrials.gov
  • Implement and/or fine‑tune different NLP models, e.g. NER models, transformer models, LLMs
  • Integrate project workflows with existing infrastructures in the EBI Chemical Biology Services and Open Targets teams
  • Prepare and evaluate benchmark datasets from the open domain as training sets for NLP models
  • Work with domain experts to develop new gold standards for NLP tasks where needed
  • Assist with and/or perform data curation to prepare clean and reliable training sets
  • Apply and/or adapt existing methods for mapping extracted entities to biomedical ontologies, e.g. drugs, side effects/phenotypes, and diseases
  • Work closely with Safety 2.0 project group members bridging the ChEMBL and Open Targets teams
  • Work closely with the Open Targets Core team to ensure seamless integration of data and workflows into the Open Targets Platform and long‑term sustainability
  • Collaborate with the Open Targets Partners to assess, prioritise, validate and refine the developed methods
  • Disseminate the outcomes of the project to the scientific community and stakeholders through presentations and publications
Qualifications
  • PhD, Masters or equivalent experience in computational linguistics, computer science, bioinformatics, or cheminformatics
  • Experience with language models e.g. transformer models, LLMs, AI agents for information extraction
  • Experience with document and text preprocessing, cleaning and transformation techniques including mapping to ontologies
  • Experience with data structures, data models and databases
  • Knowledge of cheminformatics resources and/or bioinformatics databases
  • Knowledge of data analysis and machine learning
  • Proficiency in Python
  • Knowledge of data frameworks e.g. pySpark, pandas, Polar
  • Excellent attention to detail
  • Strong communication skills, both presentations and verbal
  • Experience working in a team‑oriented environment and collaborating
  • Able to work independently, to manage time and work to deadlines
Preferred Experience
  • Experience with the application of NLP methods to cheminformatics and/or biomedical domains
  • Experience with version control
  • Experience in safety/toxicology in industry or research
Location, Contract & Salary

Open at EMBL‑EBI, Cambridge, United Kingdom.
Contract length : 3 years (project based).
Salary : Grade 5‑6 (monthly £3 303‑£3 695 after tax, excluding pension & insurances).
Closing date : 11 / 01 / 2026.
Hybrid Working : At EMBL‑EBI we embrace a hybrid approach – team members are typically on site at least three days a week, with a desk always available.

Interview Process

Introductory meetings will be held remotely starting in February 2026.

Why Join Us

EMBL‑EBI, part of the European Molecular Biology Laboratory, is a world‑leading research centre for large biological data. Enjoy a collaborative, inclusive culture, flexible working and a wide range of on‑site and remote facilities.

Benefits
  • Financial incentives : monthly family, child and non‑resident allowances, annual salary review, pension scheme, death benefit, long‑term care, accident‑at‑work and unemployment insurances
  • Flexible working arrangements – including hybrid patterns
  • Private medical insurance for you and your immediate family (including prescriptions, dental and optical cover)
  • Generous time off : 30 days annual leave per year plus public holidays
  • Relocation package including installation grant (if required)
  • Campus life : free shuttle bus, on‑site library, subsidised gym and cafeteria, casual dress code, sports and social club activities (on campus or remotely)
  • Family benefits : on‑site nursery, 10 days child sick leave, generous parental leave, holiday clubs on campus and monthly family and child allowances
  • Benefits for non‑UK residents : visa exemption, education grant for private schooling, financial support to travel back home every second year and a monthly non‑resident allowance
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.