Job Search and Career Advice Platform

Enable job alerts via email!

AI / Data Engineer

Business Umbrella

Abu Dhabi

On-site

AED 180,000 - 220,000

Full time

2 days ago
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading data engineering firm located in Abu Dhabi is seeking a highly experienced Data Engineer to build ingestion pipelines and manage AI-related datasets. The ideal candidate will have over 10 years in data engineering and experience with AI-adjacent data roles. Proficiency in Python and relevant libraries is essential. This role involves creating high-quality datasets, applying versioning for reproducibility, and optimizing data workflows for advanced LLM applications.

Qualifications

  • 10 years of experience in Data Engineering role.
  • 2 years of experience in AI-adjacent data role.
  • Proficiency in Python, pandas, and text processing tools.

Responsibilities

  • Build ingestion pipelines for structured/unstructured data using Python.
  • Clean, normalize, and prepare data formats for LLM finetuning.
  • Create high-quality task-specific datasets for training and evaluation.
  • Apply versioning to datasets for reproducibility.
  • Generate embeddings using HuggingFace or Sentence Transformers.
  • Manage vector indexes and optimize retrieval workflows.
  • Tokenize and chunk longform data for context window optimization.

Skills

Python
Data engineering
Text processing tools
Versioning with DVC or LakeFS
Tokenization libraries
Managing datasets
Understanding LLM data constraints
Job description
Responsibilities
  • Build ingestion pipelines for structured/unstructured data using Python
  • Clean normalize and prepare data formats suitable for LLM finetuning (e.g. JSONL CSV)
  • Create highquality taskspecific datasets for training and evaluation
  • Apply versioning to datasets using DVC or LakeFS for reproducibility
  • Generate embeddings using HuggingFace or Sentence Transformers
  • Manage vector indexes (FAISS Weaviate) and optimize retrieval workflows
  • Tokenize and chunk longform data for context window optimization
Requirements
  • 10 years experience in Data Engineering role
  • 2 years experience in AIadjacent data role
  • Proficiency in Python pandas and text processing tools
  • Familiarity with tokenization libraries (HuggingFace Tokenizers SentencePiece)
  • Experience managing datasets and object storage (MinIO NFS)
  • Understanding of LLM data constraints (context windows formatting prompt injection)
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.