Enable job alerts via email!

Senior Software Engineer

Virtusa

Dubai

On-site

AED 250,000 - 320,000

Full time

19 days ago

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading technology company in Dubai is seeking a Senior Software Engineer to build scalable data ingestion and streaming platforms. The candidate will design and develop streaming ingestion pipelines, ensuring quality and fault tolerance in data processing. Ideal applicants should have 5-8 years experience with Apache Spark or similar frameworks, strong SQL skills, and expertise in streaming systems like Kafka. This role offers a chance to work on innovative data solutions within a collaborative team environment.

Qualifications

5–8 years of experience designing and building data pipelines using Apache Spark or similar frameworks.
Hands-on expertise with streaming systems like Apache Kafka, Confluent Cloud, or RabbitMQ.
Deep understanding of relational databases and change data capture (CDC).
Proficiency in programming languages such as Python, Scala, or Java.

Responsibilities

Design and develop streaming ingestion pipelines.
Implement change data capture and deduplication logic.
Ensure data quality and fault tolerance.
Collaborate on data platform architecture.

Skills

Kafka

Confluent

Apache Spark

Data Lake Integration

Real-Time Data Ingestion

Python

Scala

Java

SQL

Tools

Databricks

Terraform

Jenkins

Airflow

Job Description - Senior Software Engineer (CREQ242894)

Senior Software Engineer - ( CREQ242894 )

Kafka, Confluent
Real-Time, Streaming Data Ingestion
Producers, Consumers, Topics
Data Lake Integration, Lakehouse Integration

Databricks Engineer ______________________________ Positions: 2

Overview

Duration: 3 Months | Location: Dubai. Bank is building a scalable data ingestion and streaming platform that ingests change data capture (CDC) events from diverse source systems (databases and applications), processes them in real time and lands curated data into our analytics lake.

Responsibilities

Design and develop streaming ingestion pipelines.
Use Apache Spark (Structured Streaming) and Databricks Auto Loader to consume files from cloud storage or messages from Kafka/RabbitMQ/Confluent Cloud and ingest them into Delta Lake, ensuring schema evolution and exactly‑once semantics.
Implement CDC and deduplication logic. Capture change events from source databases using Debezium, built‑in CDC features of SQL Server/Oracle or other connectors. Apply watermarking and drop duplicate strategies based on primary keys and event timestamps.
Ensure data quality and fault tolerance. Configure checkpointing, error handling and dead‑letter queues (DLQ) so that malformed or late data can be quarantined and replayed. Optimize file sizes, partitioning and clustering to maintain performance.
Scale ingestion through configuration. Build a config‑driven framework (e.g., using Airflow, DBX Jobs or Delta Live Tables) that iterates over metadata tables to deploy/update ingestion pipelines for hundreds of tables/sources without code duplication.
Collaborate on architecture and orchestration. Contribute to the overall data platform architecture—integrating data sources, message queues, processing engines and storage—and define orchestration patterns for backfill, replay and streaming jobs.
Implement monitoring, observability and security. Capture streaming query metrics and publish them to monitoring platforms (Prometheus, Grafana). Set up dashboards for lag, files processed and processing duration. Enforce role‑based access control, encryption and data masking.
Work with data consumers. Partner with analytics teams, data scientists and downstream application developers to ensure that ingested data meets their requirements. Provide documentation, metadata and lineage for all tables.
Participate in DevOps processes. Use CI/CD pipelines (e.g., Jenkins, GitHub Actions) to automate deployment of jobs; manage infrastructure with Terraform or similar tools; follow best practices for version control and code reviews.

Required skills & Experience

5–8 years of experience designing and building data pipelines using Apache Spark, Databricks or equivalent big‑data frameworks.
Hands‑on expertise with streaming and messaging systems such as Apache Kafka (publish‑subscribe architecture), Confluent Cloud, RabbitMQ or Azure Event Hub. Experience creating producers, consumers and topics and integrating them into downstream processing.
Deep understanding of relational databases and CDC. Proficiency in SQL Server, Oracle or other RDBMSs; experience capturing change events using Debezium or native CDC tools and transforming them for downstream consumption.
Proficiency in programming languages such as Python, Scala or Java and solid knowledge of SQL for data manipulation and transformation.
Cloud platform expertise. Experience with Azure or AWS services for data storage, compute and orchestration (e.g., ADLS, S3, Azure Data Factory, AWS Glue, Airflow, DBX, DLT).
Data modelling and warehousing. Knowledge of data Lakehouse architectures, Delta Lake, partitioning strategies and performance optimisation.
Version control and DevOps. Familiarity with Git and CI/CD pipelines; ability to automate deployment and manage infrastructure as code.
Strong problem‑solving and communication skills. Ability to work with cross‑functional teams and articulate complex technical concepts to non‑technical stakeholders.

Preferred/Bonus Skills

Experience with event‑driven architectures and micro‑services integration.
Exposure to NiFi, Flume or other ingestion frameworks for connecting heterogeneous sources.
Knowledge of graph processing or machine learning pipelines on Spark.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Top locations

Top companies

Top positions