Enable job alerts via email!

Data Engineer

Flow Talent

United Arab Emirates

On-site

AED 80,000 - 120,000

Full time

Yesterday
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An established industry player is seeking a skilled Data Engineer to join their dynamic team in Abu Dhabi. This role involves preparing and managing datasets for AI workflows, building ingestion pipelines using Python, and ensuring data quality for LLM fine-tuning. The ideal candidate will have over 10 years of experience in Data Engineering, with a strong focus on AI-related data roles. You will work with advanced tools and technologies to optimize data retrieval workflows and enhance dataset quality. If you are passionate about data and AI, this opportunity is perfect for you.

Qualifications

  • 10+ years of experience in Data Engineering with a focus on AI.
  • Strong proficiency in Python and data management tools.

Responsibilities

  • Prepare and manage datasets for LLM fine-tuning and AI workflows.
  • Build ingestion pipelines for structured and unstructured data.

Skills

Data Engineering
Python
Data Management
Text Processing
Version Control (DVC, LakeFS)
Data Normalization
Tokenization
AI Workflows

Education

Bachelor's Degree in Computer Science or related field

Tools

HuggingFace
Sentence Transformers
FAISS
Weaviate
pandas
MinIO
NFS

Job description

A reputable and well-established Technology company is actively recruiting a Data Engineer for their team in Abu Dhabi. Please note that you must meet all the criteria set out below for your application to be considered. Suitable candidates will be contacted within 5 working days. If you are not contacted by us within that time, please consider your application unsuccessful on this occasion.

Main Responsibilities:
  1. Prepare and manage datasets that support LLM fine-tuning and AI workflows.
  2. Build ingestion pipelines for structured and unstructured data using Python.
  3. Clean, normalize, and prepare data formats suitable for LLM fine-tuning (e.g., JSONL, CSV).
  4. Create high-quality, task-specific datasets for training and evaluation.
  5. Apply versioning to datasets using DVC or LakeFS for reproducibility.
  6. Generate embeddings using HuggingFace or Sentence Transformers.
  7. Manage vector indexes (FAISS, Weaviate) and optimize retrieval workflows.
  8. Tokenize and chunk long-form data for context window optimization.
Qualifications:
  1. 10+ years of experience in a Data Engineering role.
  2. 2+ years of experience in an AI-adjacent data role.
  3. Experience managing datasets and object storage (MinIO, NFS).
  4. Proficiency in Python, pandas, and text processing tools.
  5. Familiarity with tokenization libraries (HuggingFace Tokenizers, SentencePiece).
  6. Understanding of LLM data constraints (context windows, formatting, prompt injection).
  7. Applicants should be available for face-to-face interviews in Abu Dhabi.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.