Job Search and Career Advice Platform

Enable job alerts via email!

Data Engineer (SA25)

Barone, Budge & Dominick (Pty) Ltd

Johannesburg

Hybrid

ZAR 600 000 - 800 000

Full time

Today
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading software solutions provider in Johannesburg seeks a skilled Data Engineer responsible for building and maintaining data pipelines and architectures. Candidates should have over 5 years of experience, particularly with Databricks, and excel in Python, AWS, and Azure environments. The role involves collaboration with various teams to support data-driven decision-making and enhance data governance processes. Competitive compensation and a flexible hybrid work environment are offered.

Benefits

Flexible work environment
Competitive bonuses
Continuous learning opportunities

Qualifications

  • Minimum of 5 years of professional experience with at least 2 years with Databricks.
  • Strong coding skills in Python required for data manipulation.
  • Experience with cloud platforms AWS and Azure.

Responsibilities

  • Design, build and maintain ETL/ELT pipelines using Python, SQL, and Spark.
  • Manage cloud-based data infrastructure and ensure performance.
  • Collaborate with Data Scientists and Engineers to understand data needs.

Skills

Strong proficiency in Python
Experience with Apache Spark
Proficiency with Apache Airflow
Expert SQL skills
Deep understanding of Big Data file formats

Education

AWS Certified Solutions Architect – Associate
Microsoft Certified: Azure Solutions Architect Expert
Databricks Certification

Tools

Databricks
AWS
Azure
Apache Airflow
Snowflake
Job description

BBD is an international custom software solutions company that solves real-world problems with innovative solutions and modern technology stacks. With extensive experience across various sectors and a wide array of technologies, BBD’s core services encompass digital enablement, software engineering and solutions support, which includes cloud engineering, data science, product design and managed services.

Over the past 40 years, we have built a reputation for hiring the best talent and collaborating with client teams to deliver exceptional value through software. As the company has grown, this unwavering commitment to quality and continuous innovation has ensured clients get the full benefit from software that meets their unique environment.

BBD’s culture is one that encourages collaboration, innovation and inclusion. Our relaxed yet professional work environment extends into a flat management structure. At BBD, you are not just a number, but a valuable member of the team, working with like‑minded, passionate individuals on challenging projects in interesting spaces. We deeply believe in the importance of each individual taking control of their career growth, with the support, encouragement and guidance of the company. We do this for every BBDer, creating the space and opportunity to continue learning, growing and expanding their skillsets. We also proudly support and ensure diverse project teams as varied perspectives will always make for stronger solutions.

With hubs in 7 cities, we have mastered distributed development and support a flexible, hybrid working environment. Our hubs are also a great place to get to know people, share knowledge, and enjoy snacks, great coffee and catered lunches as well as social, sport and cultural gatherings.

Lastly, recognition is deeply ingrained in the BBD culture and we use every appropriate opportunity to show this through our Awards Nominations, shoutouts and of the course the exceptional bonuses that come from exceptional performance.

The role

BBD is looking for a skilled Data Engineer to design, build and maintain scalable data pipelines and architectures. You will play a pivotal role in enabling data‑driven decision‑making by ensuring our data infrastructure is robust, secure and efficient. You will work with modern tools and cloud platforms (AWS, Azure, Databricks) to transform raw data into actionable insights, supporting both traditional analytics and emerging AI/ML workloads.

Responsibilities
  • Pipeline development: Design, build and maintain efficient, reliable and scalable ETL/ELT pipelines using Python, SQL, and Spark
  • Architecture & modelling: Implement modern data architectures (e.g., Data Lakehouse, Medallion Architecture) and data models to support business reporting and advanced analytics
  • Cloud infrastructure: Manage and optimise cloud‑based data infrastructure on AWS and Azure, ensuring cost‑effectiveness and performance
  • Data governance: Implement data governance, security and quality standards (e.g., using Great Expectations, Unity Catalog) to ensure data integrity and compliance
  • Collaboration: Work closely with Data Scientists, AI Engineers and Business Analysts to understand data requirements and deliver high‑quality datasets
  • MLOps support: Collaborate on MLOps practices, supporting model deployment and monitoring through robust data foundations
  • Continuous improvement: Monitor pipeline performance, troubleshoot issues, and drive automation using CI/CD practices
Requirements
  • A minimum of 5 years of professional experience, with at least 2 years of experience with Databricks
Skills and Experience
  • Programming & scripting: Strong proficiency in Python for data manipulation and scripting. Experience with Scala or Java is a plus
  • Big Data processing: Extensive experience with Apache Spark (PySpark) for batch and streaming data processing
  • Workflow orchestration: Proficiency with Apache Airflow or similar tools (e.g., Prefect, Dagster, Azure Data Factory) for scheduling and managing complex workflows
  • Data warehousing: Proficiency in modern cloud data warehouses such as Snowflake, including designing, modelling and optimising analytical data structures to support reporting, BI and downstream analytics
  • Expert SQL skills for analysis and transformation
  • Deep understanding of Big Data file formats (Parquet, Avro, Delta Lake)
  • Experience designing Data Lakes and implementing patterns like the Medallion Architecture (Bronze/Silver/Gold layers)
  • Streaming: Experience with real‑time data processing using Kafka or similar streaming platforms
  • DevOps & CI/CD: Proficiency with Git for version control. Experience implementing CI/CD pipelines for data infrastructure (e.g., GitHub Actions, GitLab CI, Azure DevOps)
  • Familiarity with data quality frameworks like Great Expectations or Soda
  • Understanding of data governance principles, security, and lineage
  • Reporting & visualisation: Experience serving data to BI tools like Power BI, Tableau, or Looker
  • AI/ML familiarity: Exposure to Generative AI concepts (LLMs, RAG, Vector Search) and how data engineering supports them
  • Storage: Deep knowledge of Amazon S3 for data lake storage, including lifecycle policies and security configurations
  • ETL & orchestration: Hands‑on experience with AWS Glue (Crawlers, Jobs, Workflows, Data Catalog) for serverless data integration
  • Governance: Experience with AWS Lake Formation for centrally managing security and access controls
  • Streaming: Proficiency with Amazon Kinesis (Data Streams, Firehose) for collecting and processing real‑time data
  • Core services: Solid understanding of core AWS services (IAM, Lambda, EC2, CloudWatch) relevant to data engineering
  • Storage: Deep knowledge of Azure Data Lake Storage (ADLS) Gen2 and Blob Storage
  • ETL & orchestration: Experience with Azure Data Factory (ADF) or Azure Synapse Analytics pipelines for data integration and orchestration
  • Governance: Familiarity with Microsoft Purview for unified data governance and Microsoft Entra ID (formerly Azure AD) for access management
  • Streaming: Proficiency with Azure Event Hubs or Azure Stream Analytics for real‑time data ingestion
  • Core Services: Understanding of core Azure services (Resource Groups, VNets, Azure Monitor) relevant to data solutions
  • Platform management: Experience managing Databricks Workspaces, clusters, and compute resources
  • Governance: Proficiency with Unity Catalog for centralised access control, auditing, and data lineage
  • Development: Building and orchestrating Databricks Jobs and Delta Live Tables (DLT) pipelines
  • Deep knowledge of Delta Lake features (time travel, schema enforcement, optimisation)
  • AI & ML integration: Experience with MLflow for experiment tracking and model registry
  • Exposure to Mosaic AI features (Model Serving, Vector Search, AI Gateway) and managing LLM workloads on Databricks
Additional Information
  • Storage: Deep knowledge of Azure Data Lake Storage (ADLS) Gen2 and Blob Storage
  • ETL & orchestration: Experience with Azure Data Factory (ADF) or Azure Synapse Analytics pipelines for data integration and orchestration
  • Governance: Familiarity with Microsoft Purview for unified data governance and Microsoft Entra ID (formerly Azure AD) for access management
  • Streaming: Proficiency with Azure Event Hubs or Azure Stream Analytics for real‑time data ingestion
  • Core Services: Understanding of core Azure services (Resource Groups, VNets, Azure Monitor) relevant to data solutions
  • Platform management: Experience managing Databricks Workspaces, clusters, and compute resources
  • Governance: Proficiency with Unity Catalog for centralised access control, auditing, and data lineage
  • Development: Building and orchestrating Databricks Jobs and Delta Live Tables (DLT) pipelines
  • Deep knowledge of Delta Lake features (time travel, schema enforcement, optimisation)
  • AI & ML integration: Experience with MLflow for experiment tracking and model registry
  • Exposure to Mosaic AI features (Model Serving, Vector Search, AI Gateway) and managing LLM workloads on Databricks
Required certifications
  • AWS Certified Solutions Architect – Associate
  • Microsoft Certified: Azure Solutions Architect Expert
  • Databricks
Internal candidate profile

We are open to training internal candidates who demonstrate strong engineering fundamentals and a passion for data. Ideal internal candidates might currently be in the following roles:

  • Python Back‑end Engineer: Strong coding skills (Python) and experience with APIs / back‑end systems, looking to specialise in big data processing and distributed systems
  • DevOps Engineer: Coding background with strong infrastructure‑as‑code and CI/CD skills, interested in applying those practices specifically to data pipelines and MLOps

BBD is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, family, gender identity or expression, genetic information, marital status, political affiliation, race, religion or any other characteristic protected by applicable laws, regulations or ordinances.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.