Enable job alerts via email!

AI / Data Engineer

Business Umbrella

Abu Dhabi

On-site

AED 120,000 - 140,000

Full time

28 days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading company is seeking a Data Engineer with extensive experience in building data pipelines and managing datasets for AI applications. The ideal candidate will have a strong background in Python and data engineering, with a focus on optimizing data for LLMs. Responsibilities include creating high-quality datasets, managing data versioning, and ensuring efficient data retrieval workflows. This role offers the opportunity to work on cutting-edge AI projects in a dynamic environment.

Qualifications

  • 10 years experience in Data Engineering role.
  • 2 years experience in AI-adjacent data role.
  • Proficiency in Python, pandas, and text processing tools.

Responsibilities

  • Build ingestion pipelines for structured/unstructured data using Python.
  • Create high-quality task-specific datasets for training and evaluation.
  • Manage vector indexes and optimize retrieval workflows.

Skills

Python
Data Engineering
Text Processing

Tools

HuggingFace
Sentence Transformers
DVC
LakeFS
FAISS
Weaviate
MinIO
NFS
HuggingFace Tokenizers
SentencePiece

Job description

Build ingestion pipelines for structured/unstructured data using Python

Clean normalize and prepare data formats suitable for LLM finetuning (e.g. JSONL CSV)

Create highquality taskspecific datasets for training and evaluation

Apply versioning to datasets using DVC or LakeFS for reproducibility

Generate embeddings using HuggingFace or Sentence Transformers

Manage vector indexes (FAISS Weaviate) and optimize retrieval workflows

Tokenize and chunk longform data for context window optimization


Requirements

10 years experience in Data Engineering role

2 years experience in AIadjacent data role

Proficiency in Python pandas and text processing tools

Familiarity with tokenization libraries (HuggingFace Tokenizers SentencePiece)

Experience managing datasets and object storage (MinIO NFS)

Understanding of LLM data constraints (context windows formatting prompt injection)


Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.