¡Activa las notificaciones laborales por email!

Senior Data Scientist LLM

MULTIVERSE COMPUTING

País Vasco

Híbrido

EUR 40.000 - 70.000

Jornada completa

Hace 26 días

Mejora tus posibilidades de llegar a la entrevista

Elabora un currículum adaptado a la vacante para tener más posibilidades de triunfar.

Descripción de la vacante

Multiverse Computing, a leading Quantum Software company, is seeking a Senior Data Scientist to design data strategies for training Large Language Models. This role involves developing scalable data pipelines and ensuring data quality, working in a dynamic and inclusive environment focused on innovation and sustainability.

Servicios

Signing bonus
Relocation package
Private health insurance
Educational budget eligibility
Language classes
Discounted lunch options
Career plan opportunities

Formación

  • 3 years of experience in data science or related roles.
  • In-depth knowledge of the LLM lifecycle.
  • Hands-on experience with text data collection.

Responsabilidades

  • Design and implement strategies for dataset creation.
  • Develop scalable pipelines for data collection.
  • Conduct data audits for quality and compliance.

Conocimientos

Data Quality Metrics
Python
Data Tooling Ecosystems
Dataset Creation
NLP

Educación

Bachelors, Masters or Ph.D. in Computer Science, AI, Data Science

Herramientas

Pandas
NumPy
spaCy
Hugging Face Datasets & Transformers

Descripción del empleo

Multiverse is a wellfunded and fastgrowing deeptech company founded in 2019. We are the biggest Quantum Software company in the EU. We are also one of the 100 most promising companies in AI in the world (according to CB Insights 2023 with 150 employees and growing fully multicultural and international.

We provide hyperefficient software to companies seeking to gain an edge with quantum computing and artificial intelligence. Our main products Singularity and CompactifAI address critical needs across various industries. Singularity remains a trusted solution for bluechip companies in finance energy manufacturing cybersecurity and more. CompactifAI on the other hand is a groundbreaking compressing tool of foundational models that uses Tensor Networks to extremely compress AI systems such as large language models making these efficient and portable.

You will be working alongside world leading experts to build solutions that tackle real life issues. We look for passionate people that want to grow in an ethics driven environment promoting sustainability and diversity. We aim to continue building our truly inclusive culture come and join us.

We are seeking a Senior Data Scientist with deep expertise in creating highquality datasets for training and finetuning Large Language Models (LLMs). You will be responsible for designing and implementing scalable data pipelines and strategies to support all stages of LLM development : pretraining supervised finetuning and reinforcement learning with human feedback (RLHF).

This role is critical to ensuring the robustness safety and alignment of our AI models. You will have the autonomy to explore innovative data sourcing and curation methods and the opportunity to directly influence the capabilities of stateoftheart LLMs.

As a Senior Data Scientist you will

  • Design and implement strategies for creating sourcing and augmenting datasets tailored for LLM training and finetuning.
  • Develop scalable pipelines to collect clean filter annotate and validate large volumes of text data.
  • Conduct data audits to ensure quality diversity ethical compliance and bias mitigation.
  • Collaborate with ML engineers and researchers to align datasets with training objectives and model evaluation needs.
  • Use tools like Active Learning synthetic data generation and selfsupervised learning to maximize dataset efficiency.
  • Leverage humanintheloop (HITL) workflows for data labeling and validation where necessary.
  • Contribute to building data documentation and metadata standards (e.g. Datasheets for Datasets).
  • Keep up to date with research trends in dataset curation LLM pretraining data and benchmarking.

Required Qualifications

  • Bachelors Masters or Ph.D. in Computer Science AI Data Science or a related field.
  • 3 years of experience in data science machine learning or related roles with demonstrated experience in dataset creation for NLP or LLMs.
  • Indepth knowledge of the LLM lifecycle : pretraining finetuning alignment and evaluation.
  • Proficient in Python and data tooling ecosystems (Pandas NumPy spaCy Hugging Face Datasets & Transformers).
  • Handson experience with text data collection from diverse sources : web sing APIs proprietary corpora etc.
  • Strong understanding of data quality metrics including bias detection toxicity and readability.
  • Experience working with annotation tools (e.g. Prodigy Label Studio) and managing annotation teams or workflows.

Preferred Qualifications

  • Experience building or contributing to datasets used in LLM pretraining or supervised finetuning.
  • Familiarity with RLHF workflows and alignment techniques (e.g. preference modeling reward modeling).
  • Exposure to multilingual and lowresource language datasets.
  • Contributions to opensource datasets tools or publications in datasetcentric research.
  • Knowledge of ethical AI data governance privacy laws (e.g. GDPR) and responsible data use.
  • Indefinite contract.
  • Signing bonus.
  • We offer work visa sponsorship (If applicable).
  • Relocation package (if applicable).
  • Private health insurance.
  • Eligibility for educational budget according to internal policy.
  • Hybrid opportunity.
  • Language classes and discounted lunch options
  • Working in a high paced environment working on cutting edge technologies.
  • Career plan. Opportunity to learn and teach.
  • Progressive Company. Happy people culture

As an equal opportunity employer Multiverse Computing is committed to building an inclusive workplace. The company welcomes people from all different backgrounds including age citizenship ethnic and racial origins gender identities individuals with disabilities marital status religions and ideologies and sexual orientations to apply.

Key Skills

Laboratory Experience,Mammalian Cell Culture,Biochemistry,Assays,Protein Purification,Research Experience,Next Generation Sequencing,Research & Development,cGMP,Cell Culture,Molecular Biology,Flow Cytometry

Employment Type : Full Time

Experience : years

Vacancy : 1

Consigue la evaluación confidencial y gratuita de tu currículum.
o arrastra un archivo en formato PDF, DOC, DOCX, ODT o PAGES de hasta 5 MB.