¡Activa las notificaciones laborales por email!

Senior / Principal Data Scientist - NLP (Remote) - Spain

Veeva Systems

Barcelona

A distancia

EUR 50.000 - 75.000

Jornada completa

Hace 11 días

Mejora tus posibilidades de llegar a la entrevista

Elabora un currículum adaptado a la vacante para tener más posibilidades de triunfar.

Descripción de la vacante

A leading company in life sciences seeks a Data Scientist to develop LLM-based agents for extracting KOL data. This role involves designing and implementing information extraction pipelines, utilizing the latest NLP technologies, and collaborating with cross-functional teams. The ideal candidate will have extensive experience in NLP and machine learning, particularly in the medical domain, and will thrive in a fast-paced, innovative environment. The position allows for remote work from Spain, the UK, or the Netherlands.

Servicios

Fitness reimbursement

Life insurance

Pension fund

Formación

4+ years of experience as a data scientist or 2+ years with a Ph.D.
Strong theoretical knowledge of NLP, ML, and Deep Learning.
Proven experience with large language models.

Responsabilidades

Develop LLM-based agents for KOL data extraction.
Implement pipelines for extracting information from unstructured data.
Collaborate with software developers and DevOps engineers.

Conocimientos

NLP

Machine Learning

Deep Learning

Python

Collaboration

Educación

Ph.D. in Computer Science

Master's in AI

Herramientas

NLTK

SpaCy

Hugging Face Transformers

PyTorch

Docker

Kubernetes

Ray

Spark

AWS

GCP

Azure

Veeva is a mission-driven organization that aspires to help our customers in Life Sciences and Regulated industries bring their products to market faster. We are shaped by our values: Do the Right Thing, Customer Success, Employee Success, and Speed. Our teams develop transformative cloud software, services, consulting, and data to make our customers more efficient and effective in everything they do. Veeva is a work-anywhere company, allowing employees to work from home, at a customer site, or in an office on any given day. As a company focused on making a positive impact, we are committed to connecting life sciences and key people to improve research and care. Our product offers real-time academic, social, and medical data to build comprehensive profiles, helping industry partners find the right experts to accelerate therapeutics development and adoption, ultimately helping patients receive urgent care sooner.

Role Overview

Your role involves developing LLM-based agents specialized in searching and extracting detailed information about Key Opinion Leaders (KOLs) in healthcare. You will craft an end-to-end human-in-the-loop pipeline to analyze unstructured medical documents, perform semantic searches, and provide precise answers to queries concerning KOL data across languages and disciplines. Utilizing cloud infrastructure, you will build models for information extraction and question answering, collaborating with software developers and DevOps engineers to deploy these models into production. We aim to develop new algorithms that redefine industry standards for quality versus quantity, training ML models with the help of over 2000 curators to meet both quality and scale requirements across regions, languages, and medical specialties.

Work Location

You can work remotely from anywhere in the Netherlands, the UK, or Spain, provided you are a resident and legally authorized to work there without visa or relocation support. If you believe you are an exceptional candidate outside these conditions, please specify in a separate note for consideration.

What You'll Do

Adopt the latest NLP technologies and trends for your platform.
Experience with Reinforcement Learning from Human Feedback (RLHF) methods such as DPO and PPO for training LLMs based on human preferences.
Design, develop, and implement pipelines for extracting information from large-scale, unstructured, multi-domain, and multilingual data.
Create robust semantic search functionalities to effectively answer user queries.
Develop and utilize techniques like named entity recognition, entity linking, slot-filling, few-shot learning, active learning, question answering, and dense passage retrieval for information extraction and machine reading.
Analyze data models per source and region, and interpret model decisions.
Collaborate with data quality teams to define annotation metrics and evaluate model performance qualitatively and quantitatively.
Use cloud infrastructure for model development and collaborate with development and DevOps teams for deployment.

Requirements

4+ years of experience as a data scientist or 2+ years with a Ph.D. in Computer Science, AI, Computational Linguistics, or related fields.
Strong theoretical knowledge of NLP, ML, and Deep Learning.
Proven experience with large language models and transformer architectures like GPT, BERT.
Experience with large-scale data processing, preferably in the medical domain.
Proficiency in Python and NLP libraries such as NLTK, SpaCy, Hugging Face Transformers.
Experience with BigData frameworks (Ray, Spark) and Deep Learning frameworks (PyTorch, JAX).
Experience with cloud platforms (AWS, GCP, Azure), containerization (Docker, Kubernetes), and scripting.
Strong collaboration, communication skills, and experience in start-up environments.
Social competence, team spirit, high energy, and ambition.

Nice to Have

Background in Medical NLP.
Experience with training and serving LLMs.
Industry experience in life/health sciences, especially pharma.
Peer-reviewed publications in AI.
Leadership skills and networking for team growth.
Experience with NoSQL databases like MongoDB.
Familiarity with model registry solutions like MLflow.
Experience with distributed computing platforms such as Ray and Spark.
Additional benefits like fitness reimbursement, life insurance, and pension fund.

Veeva’s headquarters is in the San Francisco Bay Area, with offices in over 15 countries. We are committed to diversity and inclusion. If you need accommodations during the application process due to a disability or other needs, please contact us.

Consigue la evaluación confidencial y gratuita de tu currículum.

o arrastra un archivo en formato PDF, DOC, DOCX, ODT o PAGES de hasta 5 MB.