Enable job alerts via email!

Data Engineer

Capgemini Engineering

Al Khobar

Hybrid

SAR 150,000 - 200,000

Full time

15 days ago

Job summary

A global engineering services leader is seeking an experienced Data Engineer in Al Khobar, Saudi Arabia. In this role, you will design and optimize scalable data infrastructures, develop ETL workflows and integrate diverse data sources. The ideal candidate has over 5 years of experience, expertise in Apache Spark, and cloud services knowledge. Join a dynamic team with flexible work arrangements and opportunities for growth.

Benefits

Flexible work arrangements
Career growth programs
Access to certifications

Qualifications

  • 5+ years of experience in data engineering and distributed systems.
  • Experience developing data APIs and working with MLOps tools.
  • Familiarity with data governance frameworks.

Responsibilities

  • Design and maintain data pipelines for structured and unstructured data.
  • Integrate diverse data sources (APIs, databases, streams, flat files).
  • Ensure compliance with data privacy standards (PII, GDPR, HIPAA).

Skills

Expertise in Apache Spark
Strong skills in SQL
Hands-on experience with cloud services
Proficiency in data formats like Parquet
Experience with Docker

Education

Bachelor’s or Master’s in Computer Science or related field

Tools

Apache Kafka
Airflow
Docker
Job description
Get the future you want!

At Capgemini Engineering, the world leader in engineering services, we bring together a global team of engineers, scientists, and architects to help the world’s most innovative companies unleash their potential. From autonomous cars to life-saving robots, our digital and software technology experts think outside the box as they provide unique R&D and engineering services across all industries. Join us for a career full of opportunities. Where can you make a difference. Where no two days are the same.

Your Role

We are looking for a passionate and experienced Data Engineer to join our growing team. In this role, you will design, build, and optimize scalable data infrastructure that powers intelligent decision-making across industries. You’ll work with cutting-edge technologies to integrate diverse data sources, build real-time and batch pipelines, and ensure data quality, governance, and performance. You’ll collaborate with cross-functional teams to deliver robust, secure, and high-performance data solutions that drive innovation and business value.

Key Responsibilities
  • Design and maintain data pipelines for structured, semi-structured, and unstructured data
  • Optimize Apache Spark for distributed processing and scalability
  • Manage data lakes and implement Delta Lake for ACID compliance and lineage
  • Integrate diverse data sources (APIs, databases, streams, flat files)
  • Build real-time streaming pipelines using Apache Kafka
  • Automate workflows using Airflow and containerize solutions with Docker
  • Leverage cloud platforms (AWS, Azure, GCP) for scalable infrastructure
  • Develop ETL workflows to transform raw data into actionable insights
  • Ensure compliance with data privacy standards (PII, GDPR, HIPAA)
  • Build APIs to serve processed data to downstream systems
  • Implement CI/CD pipelines and observability tools (Prometheus, Grafana, Datadog)
Your Profile
  • Bachelor’s or Master’s in Computer Science, Data Engineering, or related field
  • 5+ years of experience in data engineering and distributed systems
  • Expertise in Apache Spark and Delta Lake
  • Hands‑on experience with cloud services (AWS, Azure, GCP)
  • Strong skills in SQL and NoSQL databases (PostgreSQL, MongoDB, Cassandra)
  • Proficiency in data formats like Parquet, Avro, JSON, XML
  • Experience with Airflow, Docker, and CI/CD pipelines
  • Familiarity with data governance and compliance frameworks
  • Strong understanding of data quality, lineage, and error handling
  • Experience developing data APIs and working with MLOps tools
Preferred Skills
  • Experience with Kubernetes for container orchestration
  • Knowledge of data warehouses (Snowflake, Redshift, Synapse)
  • Familiarity with real‑time analytics platforms (Flink, Druid, ClickHouse)
  • Exposure to machine learning pipelines and IoT data integration
  • Understanding of graph databases (Neo4j) and data cataloging tools (Apache Atlas, Alation)
  • Experience with data versioning tools like DVC
What You’ll Love About Working Here
  • Flexible work arrangements including remote options and flexible hours
  • Career growth programs and diverse opportunities to help you thrive
  • Access to certifications in the latest technologies and platforms
About Capgemini

Capgemini is a global leader in partnering with companies to transform and manage their business by harnessing the power of technology. The Group is guided everyday by its purpose of unleashing human energy through technology for an inclusive and sustainable future. It is a responsible and diverse organization of over 360,000 team members in more than 50 countries. With its strong 55-year heritage and deep industry expertise, Capgemini is trusted by its clients to address the entire breadth of their business needs, from strategy and design to operations, fueled by the fast-evolving and innovative world of cloud, data, AI, connectivity, software, digital engineering and platforms. The Group reported in 2022 global revenues of €22 billion.

Apply now!

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.