Enable job alerts via email!

NLP & LLM Data Scientist - Healthcare & Life Sciences

Lensa

Dover (DE)

Remote

USD 140,000 - 200,000

Full time

30+ days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An innovative firm in the healthcare sector is seeking a skilled NLP Data Scientist to join their team. This role focuses on utilizing advanced language models and natural language processing techniques to analyze medical data, enhancing drug development processes. As part of a dynamic team, you will collaborate with clinical and data scientists to create impactful solutions that improve patient outcomes. This position offers an exciting opportunity to work at the forefront of technology and healthcare, where your contributions will directly influence the development of life-saving therapies. If you are passionate about data science and healthcare, this is the perfect opportunity for you.

Qualifications

  • 2+ years of experience in NLP and handling EHR data is essential.
  • Proficient in Python and SQL, with strong analytical skills.

Responsibilities

  • Leverage NLP and LLMs to extract and interpret unstructured medical data.
  • Collaborate with scientists to create efficient NLP models for healthcare.

Skills

Natural Language Processing (NLP)
Large Language Models (LLM)
Python
SQL
Data Analysis
Communication Skills
Problem-Solving

Education

Master's or Ph.D. in Computational Biology
Master's or Ph.D. in Data Science
Master's or Ph.D. in Machine Learning

Tools

NLTK
spaCy
Hugging Face Transformers
PyTorch
TensorFlow
AWS Redshift

Job description

NLP & LLM Data Scientist – Healthcare & Life Sciences

Location: Remote, United States

Date Posted: Feb 21, 2025

Employment Type: Full Time

Job ID: R-881

Description

About Norstella

At Norstella, our mission is simple: to help our clients bring life-saving therapies to market quicker—and help patients in need.

Founded in 2022, but with history going back to 1939, Norstella unites best-in-class brands to help clients navigate the complexities at each step of the drug development life cycle —and get the right treatments to the right patients at the right time.

Each organization (Citeline, Evaluate, MMIT, Panalgo, The Dedham Group) delivers must-have answers for critical strategic and commercial decision-making. Together, via our market-leading brands, we help our clients:

  1. Citeline – accelerate the drug development cycle
  2. Evaluate – bring the right drugs to market
  3. MMIT – identify barriers to patient access
  4. The Dedham Group – think strategically for specialty therapeutics

By combining the efforts of each organization under Norstella, we can offer an even wider breadth of expertise, cutting-edge data solutions and expert advisory services alongside advanced technologies such as real-world data, machine learning and predictive analytics.

As one of the largest global pharma intelligence solution providers, Norstella has a footprint across the globe with teams of experts delivering world-class solutions in the USA, UK, The Netherlands, Japan, China and India.

Job Description:

Norstella Real World Data (RWD) is seeking a skilled NLP Data Scientist with a clinical background focused on Language Models to join our AI & Life Sciences Solutions team. Your expertise in processing and understanding natural language data, along with your knowledge of Electronic Health Records (EHR) and laboratory reports analysis, will be instrumental in driving our data science initiatives and innovations, particularly in the development of rich multimodal real-world datasets to expedite RWD-driven drug development in pharma.

Responsibilities:

  1. Employ and leverage NLP and open-source Large Language Models (LLM) such as LLama2, Mixtral, Qwen, BERT, etc., to extract, process, and interpret unstructured medical data from diverse sources like EHRs, medical notes, and laboratory reports.
  2. Collaborate with clinical scientists and data scientists to create efficient NLP models for healthcare, exhibiting an understanding of both the technical and medical aspects of the data.
  3. Conduct data cleaning, preprocessing, and validation to maintain the accuracy and reliability of insights gathered from NLP processes.
  4. Validate and present data findings to stakeholders, exhibiting clear and effective communication skills.

Qualifications:

  1. Master's or Ph.D. degree in Computational Biology, Computer Science, Data Science, Computational Linguistics, Machine Learning, or a related analytical field.
  2. Deep understanding and direct experience (2+ years) in handling and interpreting either Electronic Health Records (EHR) and laboratory test results or genetic test results is a must.
  3. Proven experience (2+ years) in NLP with a strong knowledge of NLP techniques such as Named Entity Recognition (NER), text summarization, topic modeling, etc. and their applied use in healthcare.
  4. Expert-level understanding and practical experience (1+ years) with open-source Large Language Models (Llama2/3, Mixtral etc.), e.g., prompt engineering, inference, and fine-tuning.
  5. Proficient in Python and SQL, with strong experience in NLP libraries such as NLTK, spaCy, Hugging Face Transformers, and deep learning libraries such as PyTorch, TensorFlow.
  6. Familiarity with common data science and ML practices, e.g., version control systems, agile methodologies, and documentation.
  7. Experience in working with AWS cloud environment and large databases (e.g., AWS Redshift).
  8. Experience in managing ML lifecycle using open-source tools (e.g., MLflow).
  9. Detail-oriented with strong analytical and problem-solving abilities.
  10. Excellent verbal and written communication skills, with the ability to present complex data to a non-technical audience.

Preferred Qualifications:

  1. Experience dealing with protected health information (PHI) and familiarity with healthcare-related data privacy laws such as HIPAA.
  2. Familiarity with standard healthcare codes and terminologies such as ICD-10, CPT, LOINC, and SNOMED CT.
  3. Experience in RAG (Retrieval-Augmented Generation) and vector store in the context of storing large volumes of healthcare unstructured documents and querying those.

Compensation:

The expected base salary for this position ranges from $140,000 to $200,000. It is not typical for offers to be made at or near the top of the range. Salary offers are based on a wide range of factors including relevant skills, training, experience, education, and, where applicable, licensure or certifications obtained.

Norstella is an equal opportunity employer and does not discriminate on the grounds of gender, sexual orientation, marital or civil partner status, pregnancy or maternity, gender reassignment, race, color, nationality, ethnic or national origin, religion or belief, disability or age. Our ethos is to respect and value people’s differences, to help everyone achieve more at work as well as in their personal lives so that they feel proud of the part they play in our success.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

NLP & LLM Data Scientist - Healthcare & Life Sciences

Lensa

Boston

Remote

USD 140,000 - 200,000

8 days ago

Lead Data Engineer - GenAI (Hybrid or Remote)

IIBA (International Institute of Business Analysis)

Princeton

Remote

USD 90,000 - 200,000

Yesterday
Be an early applicant

Lead Data Engineer - GenAI (Hybrid or Remote)

Quality Control Specialist - Pest Control

Princeton

Remote

USD 90,000 - 200,000

2 days ago
Be an early applicant

Director, Data Scientist (Remote)

USAA

Tampa

Remote

USD 189,000 - 362,000

Yesterday
Be an early applicant

Senior Data Scientist, Model Strategy Management

block

New York

Remote

USD 168,000 - 297,000

2 days ago
Be an early applicant

Senior Principal Pharmacovigilance Scientist, Gastrointestinal and Inflammation

Gated Talent

Boston

Remote

USD 137,000 - 216,000

2 days ago
Be an early applicant

John Snow Labs US-Based Healthcare Data Scientist

John Snow Labs Inc.

Delaware

Remote

USD 90,000 - 150,000

12 days ago

Lead Data Scientist - Healthcare Delivery

CVS Health

Harrisburg

Remote

USD 106,000 - 285,000

14 days ago

Full Stack AI/ML Principal Data Scientist I** New York or Alpharetta, GA

LexisNexis Risk Solutions

New York

Remote

USD 127,000 - 183,000

2 days ago
Be an early applicant