
Enable job alerts via email!
A comprehensive IT consulting firm in Cape Town seeks an experienced Data Engineer to design and optimize data pipelines using AWS services and Databricks. Candidates should possess a relevant degree and over 5 years of experience in data engineering, with advanced skills in ETL processes, DBT Core, and data monitoring. Strong communication and problem-solving abilities are essential in this role.
Key Responsibilities : Design & Develop Data Pipelines : Build and optimize scalable, reliable, and automated ETL / ELT pipelines using AWS services (e.g., AWS Glue, AWS Lambda, Redshift, S3) and Databricks.
DBT Core Implementation : Lead the implementation of DBT Core to automate data transformations, develop reusable models, and maintain efficient ELT processes.
Optimize Data Workflows : Monitor, troubleshoot, and optimize data pipelines for performance and cost-efficiency in cloud environments.
Utilize Databricks for processing large-scale data sets and streamlining data workflows. Data Quality & Monitoring : Ensure high-quality data by implementing data validation and monitoring systems.
Troubleshoot data issues and create solutions to ensure data reliability.
Automation & CI / CD : Implement CI / CD practices for data pipeline deployment and maintain automation for monitoring and scaling data infrastructure in AWS and Databricks.
Documentation & Best Practices : Maintain comprehensive documentation for data pipelines, architectures, and best practices in AWS, Databricks, and DBT Core.
Ensure knowledge sharing across teams.
Skills & Qualifications : Required : Bachelor's / Master's degree in computer science, Engineering or a related field. 5+ years of experience as a Data Engineer or in a similar role. Extensive hands-on experience with AWS services (S3, Redshift, Glue, Lambda, Kinesis, etc.) for building scalable and reliable data solutions. Advanced expertise in Databricks, including the creation and optimization of data pipelines, notebooks, and integration with other AWS services. Strong experience with DBT Core for data transformation and modelling, including writing, testing, and maintaining DBT models. Proficiency in SQL and experience with designing and optimizing complex queries for large datasets. Strong programming skills in Python / PySpark, with the ability to develop custom data processing logic and automate tasks. Experience with Data Warehousing and knowledge of concepts related to OLAP and OLTP systems. Expertise in building and managing ETL / ELT pipelines, automating data workflows, and performing data validation. Familiarity with CI / CD concepts, version control (e.g., Git), and deployment automation. Having worked under Agile project environment. Preferred : Have exposure to ingestion tools such as Matillion, Fivetran etc. Experience with Apache Spark and distributed data processing in Databricks.
Familiarity with streaming data solutions (e.g., AWS Kinesis, Apache Kafka).
Soft Skills : Excellent communication skills, with the ability to explain complex technical concepts to non-technical stakeholders. Strong analytical and problem-solving skills, capable of troubleshooting complex data pipeline issues.