Job Search and Career Advice Platform

Enable job alerts via email!

AI Data Engineer for LLM Pipelines & Embeddings

Business Umbrella

Abu Dhabi

On-site

AED 180,000 - 220,000

Full time

2 days ago
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading data engineering firm located in Abu Dhabi is seeking a highly experienced Data Engineer to build ingestion pipelines and manage AI-related datasets. The ideal candidate will have over 10 years in data engineering and experience with AI-adjacent data roles. Proficiency in Python and relevant libraries is essential. This role involves creating high-quality datasets, applying versioning for reproducibility, and optimizing data workflows for advanced LLM applications.

Qualifications

  • 10 years of experience in Data Engineering role.
  • 2 years of experience in AI-adjacent data role.
  • Proficiency in Python, pandas, and text processing tools.

Responsibilities

  • Build ingestion pipelines for structured/unstructured data using Python.
  • Clean, normalize, and prepare data formats for LLM finetuning.
  • Create high-quality task-specific datasets for training and evaluation.
  • Apply versioning to datasets for reproducibility.
  • Generate embeddings using HuggingFace or Sentence Transformers.
  • Manage vector indexes and optimize retrieval workflows.
  • Tokenize and chunk longform data for context window optimization.

Skills

Python
Data engineering
Text processing tools
Versioning with DVC or LakeFS
Tokenization libraries
Managing datasets
Understanding LLM data constraints
Job description
A leading data engineering firm located in Abu Dhabi is seeking a highly experienced Data Engineer to build ingestion pipelines and manage AI-related datasets. The ideal candidate will have over 10 years in data engineering and experience with AI-adjacent data roles. Proficiency in Python and relevant libraries is essential. This role involves creating high-quality datasets, applying versioning for reproducibility, and optimizing data workflows for advanced LLM applications.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.