Enable job alerts via email!

AI / Data Engineer

Client of Business Umbrella

Abu Dhabi

On-site

AED 40,000 - 60,000

Full time

9 days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading recruitment firm is seeking an experienced Data Engineer to develop ingestion pipelines and manage complex datasets in Abu Dhabi. The role requires a strong background in Python and data management, alongside a deep understanding of LLM data constraints. Candidates should have over 10 years of experience in data engineering and possess relevant technical skills to support cutting-edge AI projects.

Qualifications

  • 10+ years’ experience in Data Engineering role.
  • 2+ years’ experience in AI-adjacent data role.
  • Proficiency in Python, pandas, and text processing tools.

Responsibilities

  • Build ingestion pipelines for structured/unstructured data using Python.
  • Clean, normalize, and prepare data formats suitable for LLM fine-tuning.
  • Manage vector indexes and optimize retrieval workflows.

Skills

Data Engineering
Python
Data Management
Text Processing
Tokenization

Education

Bachelors in Computer Application

Tools

pandas
HuggingFace
FAISS
LakeFS
MinIO
NFS

Job description

Bachelors in Computer Application(Computers)

Nationality

Any Nationality

Vacancy

1 Vacancy

Job Description

Job Description

· Build ingestion pipelines for structured/unstructured data using Python

· Clean, normalize, and prepare data formats suitable for LLM fine-tuning (e.g., JSONL, CSV)

· Create high-quality, task-specific datasets for training and evaluation

· Apply versioning to datasets using DVC or LakeFS for reproducibility

· Generate embeddings using HuggingFace or Sentence Transformers

· Manage vector indexes (FAISS, Weaviate) and optimize retrieval workflows

· Tokenize and chunk long-form data for context window optimization

Requirements

· 10+ years’ experience in Data Engineering role

· 2+ years’ experience in AI-adjacent data role

· Proficiency in Python, pandas, and text processing tools

· Familiarity with tokenization libraries (HuggingFace Tokenizers, SentencePiece)

· Experience managing datasets and object storage (MinIO, NFS)

· Understanding of LLM data constraints (context windows, formatting, prompt injection)

Company Industry

  • Recruitment
  • Placement Firm
  • Executive Search

Department / Functional Area

  • IT Software

Keywords

Disclaimer: Naukrigulf.com is only a platform to bring jobseekers & employers together. Applicants are advised to research the bonafides of the prospective employer independently. We do NOT endorse any requests for money payments and strictly advice against sharing personal or bank related information. We also recommend you visit Security Advice for more information. If you suspect any fraud or malpractice, email us at abuse@naukrigulf.com

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.