Overview
The Data Engineer is responsible for the development and support of our centralised, enterprise data workflows. These workflows automate the sourcing, transformation, and provisioning of data for consumers such as global and local applications, data models and reporting. A senior Data Engineer has proven skills, ability and a delivery track record over several years and is able to work independently as well as within teams as a subject matter expert. Depending on the level of seniority, the engineer will be given responsibility for more technically challenging projects with greater financial impact and risk to the firm.
Data Pipeline Development
- Building and maintaining data pipelines that collect data from source systems, store the data within the Africa Data Warehouse, and process the data efficiently.
- Build data end points that provision data for consumers and applications, both locally and globally.
- Ensure that provisioned data is standardised and consistent in line with the enterprise data catalogue definitions.
- Ensure that deployment of code is done through approved DevOps pipelines and follows the development, staging and production lifecycle.
- Ensure ongoing review and maintenance of pipelines to meet changing business and data rules and requirements, and run efficiently.
Data Transformation
- Converting of raw data into formats that meet consumer requirements and/or are useful for analysis, that may involve cleaning and wrangling the data. This includes:
- Data Cleaning: Removing or correcting errors, inconsistencies, and duplicates in the data to ensure its accuracy and reliability.
- Data Normalization: Standardizing data formats and structures to ensure consistency across different datasets.
- Data Enrichment: Enhancing data by adding relevant information from external sources, which can provide more context and value.
- Data Aggregation: Summarizing detailed data into higher-level insights, such as calculating averages, totals, or other statistical measures.
- Data Integration: Combining data from various sources into a unified dataset, making it easier to analyse and derive insights.
- Data Formatting: Converting data into the required format for analysis or reporting.
- Ensuring that standard calculations are applied in line with data catalogue definitions.
- Ensuring ongoing review of transformation calculations to align with changing business and data rules.
Data Modelling
- Designing, constructing, and managing data models to ensure data can be referenced and related in a manner that supports business operations and processes.
- Creating high-level (Conceptual) models that outline the overall structure of the data and how different data elements relate to each other.
- Developing detailed (Logical) models that define the data elements, their attributes, and the relationships between them.
- Translating logical models into physical models that specify how the data will be stored in databases, including tables, columns, data types, and indexes.
- Mapping out how data moves through different systems and processes within the organization using data flow diagrams.
Data Quality Assurance
- Ensuring the accuracy and integrity of data by developing validation methods and monitoring data quality.
- Ensure that all deployed workflows meet the Definition of Done and have undergone peer review before being deployed to staging and production environments.
- Ensure that all data deliverables meet the business requirements as outlined in the acceptance criteria.
- Ensure robust, optimised data solutions by applying techniques such as stress testing.
- Ensure that test tasks and results are recorded and confirmed.
Security and Compliance
- Understanding of data governance and security policies and the application of secure coding principals to protect sensitive information.
- Adherence to PwC data design and development standards.
- Ensuring that accurate technical documentation is maintained.
- Ensuring that security by design is implemented.
- Ensuring that the concept of least privileged access to data and minimisation of data is applied.
- Ensuring that data privacy techniques are applied such as data anonymisation, aggregation and de-identification.
- Ensuring that data is encrypted in transit and at rest.
- Ensure that appropriate change and release processes are followed.
Collaboration
- Working with product teams and LoS stakeholders to understand data needs and ensure data is accessible and usable.
- Attend design session to provide input into data requirements for solutions.
- Develop technical designs and technical steps to meet data requirements and provide timelines that feed into overall project delivery.
- Engage with local and global technical teams to determine dependencies and incorporate timelines and dependency steps into delivery considerations and timelines.
Technical Mentorship and Training
- Act as a mentor to junior staff within the team.
- Provide input into the development of technical training curriculums.
- Provide technical input into data communities of interest and practice.
Desired Skills and Experience
- Strong analytical skills to troubleshoot and optimize data processes.
- Ability to collaborate with data scientists, analysts, and other LoS stakeholders to understand data needs and convey technical concepts clearly.
- Ability to ensure data accuracy and integrity through meticulous validation and monitoring.
- Technical skills:
- SQL
- T-SQL
- SSIS
- SSAS
- Database design
- Database security
- Database tuning
- Database monitoring
- Task automation/scheduling
- Data modeling
- Azure Data Lake
- Azure SQL (serverless, memSQL)
- Azure Synapse (Pipelines, SQL)
- API development (SOAP, JSON, Graph)
- Python (Pyspark)
- Power Bi
- Machine learning (understanding)
- Containerisation (understanding)