Enable job alerts via email!

Data Engineer (SA25)

Barone, Budge & Dominick (Pty) Ltd

Johannesburg

Hybrid

ZAR 600 000 - 800 000

Full time

Today

Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading software solutions provider in Johannesburg seeks a skilled Data Engineer responsible for building and maintaining data pipelines and architectures. Candidates should have over 5 years of experience, particularly with Databricks, and excel in Python, AWS, and Azure environments. The role involves collaboration with various teams to support data-driven decision-making and enhance data governance processes. Competitive compensation and a flexible hybrid work environment are offered.

Benefits

Flexible work environment

Competitive bonuses

Continuous learning opportunities

Qualifications

Minimum of 5 years of professional experience with at least 2 years with Databricks.
Strong coding skills in Python required for data manipulation.
Experience with cloud platforms AWS and Azure.

Responsibilities

Design, build and maintain ETL/ELT pipelines using Python, SQL, and Spark.
Manage cloud-based data infrastructure and ensure performance.
Collaborate with Data Scientists and Engineers to understand data needs.

Skills

Strong proficiency in Python

Experience with Apache Spark

Proficiency with Apache Airflow

Expert SQL skills

Deep understanding of Big Data file formats

Education

AWS Certified Solutions Architect – Associate

Microsoft Certified: Azure Solutions Architect Expert

Databricks Certification

Tools

Databricks

AWS

Azure

Apache Airflow

Snowflake

BBD is an international custom software solutions company that solves real-world problems with innovative solutions and modern technology stacks. With extensive experience across various sectors and a wide array of technologies, BBD’s core services encompass digital enablement, software engineering and solutions support, which includes cloud engineering, data science, product design and managed services.

Over the past 40 years, we have built a reputation for hiring the best talent and collaborating with client teams to deliver exceptional value through software. As the company has grown, this unwavering commitment to quality and continuous innovation has ensured clients get the full benefit from software that meets their unique environment.

BBD’s culture is one that encourages collaboration, innovation and inclusion. Our relaxed yet professional work environment extends into a flat management structure. At BBD, you are not just a number, but a valuable member of the team, working with like‑minded, passionate individuals on challenging projects in interesting spaces. We deeply believe in the importance of each individual taking control of their career growth, with the support, encouragement and guidance of the company. We do this for every BBDer, creating the space and opportunity to continue learning, growing and expanding their skillsets. We also proudly support and ensure diverse project teams as varied perspectives will always make for stronger solutions.

With hubs in 7 cities, we have mastered distributed development and support a flexible, hybrid working environment. Our hubs are also a great place to get to know people, share knowledge, and enjoy snacks, great coffee and catered lunches as well as social, sport and cultural gatherings.

Lastly, recognition is deeply ingrained in the BBD culture and we use every appropriate opportunity to show this through our Awards Nominations, shoutouts and of the course the exceptional bonuses that come from exceptional performance.

The role

BBD is looking for a skilled Data Engineer to design, build and maintain scalable data pipelines and architectures. You will play a pivotal role in enabling data‑driven decision‑making by ensuring our data infrastructure is robust, secure and efficient. You will work with modern tools and cloud platforms (AWS, Azure, Databricks) to transform raw data into actionable insights, supporting both traditional analytics and emerging AI/ML workloads.

Responsibilities

Pipeline development: Design, build and maintain efficient, reliable and scalable ETL/ELT pipelines using Python, SQL, and Spark
Architecture & modelling: Implement modern data architectures (e.g., Data Lakehouse, Medallion Architecture) and data models to support business reporting and advanced analytics
Cloud infrastructure: Manage and optimise cloud‑based data infrastructure on AWS and Azure, ensuring cost‑effectiveness and performance
Data governance: Implement data governance, security and quality standards (e.g., using Great Expectations, Unity Catalog) to ensure data integrity and compliance
Collaboration: Work closely with Data Scientists, AI Engineers and Business Analysts to understand data requirements and deliver high‑quality datasets
MLOps support: Collaborate on MLOps practices, supporting model deployment and monitoring through robust data foundations
Continuous improvement: Monitor pipeline performance, troubleshoot issues, and drive automation using CI/CD practices

Requirements

A minimum of 5 years of professional experience, with at least 2 years of experience with Databricks

Skills and Experience

Programming & scripting: Strong proficiency in Python for data manipulation and scripting. Experience with Scala or Java is a plus
Big Data processing: Extensive experience with Apache Spark (PySpark) for batch and streaming data processing
Workflow orchestration: Proficiency with Apache Airflow or similar tools (e.g., Prefect, Dagster, Azure Data Factory) for scheduling and managing complex workflows
Data warehousing: Proficiency in modern cloud data warehouses such as Snowflake, including designing, modelling and optimising analytical data structures to support reporting, BI and downstream analytics
Expert SQL skills for analysis and transformation
Deep understanding of Big Data file formats (Parquet, Avro, Delta Lake)
Experience designing Data Lakes and implementing patterns like the Medallion Architecture (Bronze/Silver/Gold layers)
Streaming: Experience with real‑time data processing using Kafka or similar streaming platforms
DevOps & CI/CD: Proficiency with Git for version control. Experience implementing CI/CD pipelines for data infrastructure (e.g., GitHub Actions, GitLab CI, Azure DevOps)
Familiarity with data quality frameworks like Great Expectations or Soda
Understanding of data governance principles, security, and lineage
Reporting & visualisation: Experience serving data to BI tools like Power BI, Tableau, or Looker
AI/ML familiarity: Exposure to Generative AI concepts (LLMs, RAG, Vector Search) and how data engineering supports them
Storage: Deep knowledge of Amazon S3 for data lake storage, including lifecycle policies and security configurations
ETL & orchestration: Hands‑on experience with AWS Glue (Crawlers, Jobs, Workflows, Data Catalog) for serverless data integration
Governance: Experience with AWS Lake Formation for centrally managing security and access controls
Streaming: Proficiency with Amazon Kinesis (Data Streams, Firehose) for collecting and processing real‑time data
Core services: Solid understanding of core AWS services (IAM, Lambda, EC2, CloudWatch) relevant to data engineering
Storage: Deep knowledge of Azure Data Lake Storage (ADLS) Gen2 and Blob Storage
ETL & orchestration: Experience with Azure Data Factory (ADF) or Azure Synapse Analytics pipelines for data integration and orchestration
Governance: Familiarity with Microsoft Purview for unified data governance and Microsoft Entra ID (formerly Azure AD) for access management
Streaming: Proficiency with Azure Event Hubs or Azure Stream Analytics for real‑time data ingestion
Core Services: Understanding of core Azure services (Resource Groups, VNets, Azure Monitor) relevant to data solutions
Platform management: Experience managing Databricks Workspaces, clusters, and compute resources
Governance: Proficiency with Unity Catalog for centralised access control, auditing, and data lineage
Development: Building and orchestrating Databricks Jobs and Delta Live Tables (DLT) pipelines
Deep knowledge of Delta Lake features (time travel, schema enforcement, optimisation)
AI & ML integration: Experience with MLflow for experiment tracking and model registry
Exposure to Mosaic AI features (Model Serving, Vector Search, AI Gateway) and managing LLM workloads on Databricks

Additional Information

Storage: Deep knowledge of Azure Data Lake Storage (ADLS) Gen2 and Blob Storage
ETL & orchestration: Experience with Azure Data Factory (ADF) or Azure Synapse Analytics pipelines for data integration and orchestration
Governance: Familiarity with Microsoft Purview for unified data governance and Microsoft Entra ID (formerly Azure AD) for access management
Streaming: Proficiency with Azure Event Hubs or Azure Stream Analytics for real‑time data ingestion
Core Services: Understanding of core Azure services (Resource Groups, VNets, Azure Monitor) relevant to data solutions
Platform management: Experience managing Databricks Workspaces, clusters, and compute resources
Governance: Proficiency with Unity Catalog for centralised access control, auditing, and data lineage
Development: Building and orchestrating Databricks Jobs and Delta Live Tables (DLT) pipelines
Deep knowledge of Delta Lake features (time travel, schema enforcement, optimisation)
AI & ML integration: Experience with MLflow for experiment tracking and model registry
Exposure to Mosaic AI features (Model Serving, Vector Search, AI Gateway) and managing LLM workloads on Databricks

Required certifications

AWS Certified Solutions Architect – Associate
Microsoft Certified: Azure Solutions Architect Expert
Databricks

Internal candidate profile

We are open to training internal candidates who demonstrate strong engineering fundamentals and a passion for data. Ideal internal candidates might currently be in the following roles:

Python Back‑end Engineer: Strong coding skills (Python) and experience with APIs / back‑end systems, looking to specialise in big data processing and distributed systems
DevOps Engineer: Coding background with strong infrastructure‑as‑code and CI/CD skills, interested in applying those practices specifically to data pipelines and MLOps

BBD is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, family, gender identity or expression, genetic information, marital status, political affiliation, race, religion or any other characteristic protected by applicable laws, regulations or ordinances.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Top cities

Top companies

Popular jobs