Senior Machine Learning Engineer
Perfil buscado (Hombre/Mujer):
- Design and implement strategies for creating, sourcing, and augmenting datasets tailored for LLM training and fine‑tuning.
- Develop scalable pipelines to collect, clean, filter, annotate, and validate large volumes of text data, ensuring quality and ethical compliance.
- Collaborate with ML engineers, researchers, and software engineers to achieve ambitious goals in the preparation of LLMs and complementary work such as dataset preparation, model evaluation, and model serving.
- Develop and integrate new routines for modifying and enhancing LLMs, extending their functionality.
- Make effective use of distributed compute resources and clusters (GPUs), identifying opportunities for further optimization.
- Lead the end‑to‑end preparation of compressed and specialized LLMs for use in production.
- Stay current with research trends in LLM foundation models, dataset curation, pre‑training data, and benchmarking.
- Contribute to documentation, development standards, and maintain a healthy shared code base.
- Mentor other engineers and share knowledge of cutting‑edge techniques.
You will join a European deep‑tech leader in quantum and AI, in a hybrid role based in Zaragoza.
Qualifications
- Masters, or Ph.D. in Computer Science, AI, Data Science, Physics, Math, or a related field (or equivalent industry experience).
- 4+ years of experience in data science, machine learning, or related roles, with demonstrated experience with NLP or LLMs.
- In‑depth knowledge of large foundational model architectures (language and multimodal models) and their lifecycle: training, fine‑tuning, alignment, and evaluation.
- Proficiency in Python and data tooling ecosystems (Pandas, NumPy, Hugging Face Datasets, Transformers libraries).
- Hands‑on experience with text data collection from diverse sources: web scraping, APIs, proprietary corpora, etc.
- Strong understanding of data quality metrics including bias detection, toxicity, and readability.
- Experience in large shared distributed computing environments and familiarity with tools for hardware optimization (vLLM, TensorRT, NeMo, etc.).
- Experience with version control (git), unit testing, and core software development practices.
- Fluency in English and Spanish.
Compensation & Benefits
- Competitive salary.
- Two unique bonuses: signing and retention.
- Fixed‑term contract with possibility of becoming permanent.
- Hybrid role and flexible working hours.
- Opportunity to be part of an organization focused on technology innovation.
Keywords: Machine Learning, LLM, Python, Pandas, NumPy, Hugging