Enable job alerts via email!

Intermediate Senior Data Engineer (Databricks)

AICA Consultancy

Centurion

On-site

ZAR 600 000 - 800 000

Full time

30+ days ago

Job summary

A leading data consultancy firm in Centurion is seeking a Data Engineer certified in Databricks. The role involves designing, developing, and optimizing scalable data pipelines, transforming raw data for analytics, and implementing real-time streaming solutions. Candidates should have strong expertise in Python, SQL, and cloud platforms. This position offers a dynamic environment and the chance to enhance data systems significantly.

Qualifications

Proficient in Databricks, certified.
Strong experience in Python, SQL, and ETL/ELT development.
Familiarity with real-time data processing and streaming.
Knowledge of cloud platforms such as AWS, Azure, or GCP.

Responsibilities

Design and develop efficient ETL/ELT pipelines.
Optimize data workflows using best practices.
Integrate data from various sources including APIs and cloud storage.
Implement real-time data streaming solutions using Databricks.

Skills

Databricks

Spark

Delta Lake

Python

SQL

ETL development

Real-time data processing

Cloud platforms

Education

Databricks certification

Tools

AWS S3

Azure Data Lake

Unity CatLog

Overview

We are looking for a Data Engineer who is certified in Databricks (required) to join our team. In this role you will be designing, developing, and optimizing scalable data pipelines and workflows on Databricks. The engineer will work closely with stakeholders to make certain data reliability, performance, and alignment with business requirements.

Responsibilities

Data Pipeline Development

Building efficient ETL / ELT pipelines using Databricks and Delta Lake for structured, semi-structured, and unstructured data.
Transforming raw data into consumable datasets for analytics and machine learning.

Data Optimization

Improving performance by implementing best practices like partitioning, caching, and Delta Lake optimizations.
Resolving bottlenecks and ensuring scalability.

Data Integration

Integrating data from various sources such as APIs, databases, and cloud storage systems (e.g., AWS S3, Azure Data Lake).

Real-Time Streaming

Designing and deploying real-time data streaming solutions using Databricks Structured Streaming.

Data Quality and Governance

Implementing data validation, schema enforcement, and monitoring to ensure high-quality data delivery.
Using Unity CatLog to manage metadata, access permissions, and data lineage.

Collaboration and Documentation

Collaborating with data analysts, data scientists, and other stakeholders to meet business needs.
Documenting pipelines, workflows, and technical solutions.

Responsibilities

Fully functional and documented data pipelines.
Optimized and scalable data workflows on Databricks.
Real-time streaming solutions integrated with downstream systems.
Detailed documentation for implemented solutions and best practices.

Qualifications

Proficiency in Databricks(certified), Spark, and Delta Lake.
Strong experience with Python, SQL, and ETL / ELT development.
Familiarity with real-time data processing and streaming.
Knowledge of cloud platforms (e.g., AWS, Azure, GCP).
Experience with data governance and tools like Unity CatLog.

Assumptions

Access to necessary datasets and cloud infrastructure will be provided.
Timely input and feedback from stakeholders.

Success Metrics

Data pipelines deliver accurate and consistent data.
Workflows meet performance benchmarks.
Real-time streaming solutions operate with minimal latency.
Stakeholders are satisfied with the quality and usability of the solutions.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.