Job Description
Design and develop data pipelines from source to end user. Optimize data pipelines. Review business requirements and understand business rules and transactional source data models. Review performance of Extract Load Transform (ELT) pipelines with developers and suggest improvements. Create end-to-end integration tests for data pipelines and improve pipelines accordingly. Translate requirements into clear design specifications and present solutions for team review, incorporating feedback from team lead and members. Conduct knowledge transfer and training sessions, ensuring staff receive the knowledge required to support and improve systems. Develop documentation and materials as part of reviews and knowledge transfer.
Requirement/Must Have
- Demonstrated fluency in Python, with knowledge of best practices and coding conventions for building scalable data pipelines.
- Experience with data pipeline and workflow development, orchestration, deployment, and automation.
- Experience with cloud data platforms, data management, and data exchange tools and technologies.
- Experience with commercial and open-source databases and data storage management, including DaaS, DBaaS, and DWaaS.
- Proficiency in SQL and Python, with hands-on experience using Databricks and Spark SQL for data modeling and transformation.
- Understanding of iterative product development cycles (Client, Agile, Beta, Live).
- Experience with version-controlled, shared codebases using Git (Azure DevOps, GitHub, Bitbucket) and participation in pull request code reviews.
- Experience in Continuous Integration/Continuous Deployment (CI/CD) and data provisioning automation.
Responsibilities
- Design and develop data pipelines from source to end user.
- Optimize data pipelines.
- Review business requirements and translate into clear design specifications.
- Conduct performance reviews of ELT pipelines and recommend improvements.
- Create and execute integration tests for data pipelines.
- Deliver knowledge transfer sessions and prepare learning materials.
- Collaborate in a team environment to review, refine, and implement solutions.
Skills
Technical Experience (50%)
- Proficiency in SQL and Python with Databricks and Spark SQL.
- Ability to design automated data quality checks using Python, SQL, and frameworks such as Great Expectations or Soda.
- Experience with performance monitoring and tuning of data pipelines and stores.
- Experience with fact/dimension models, data mapping, data warehouses, data lakes, and data lakehouses.
- Skilled in managing structured, semi-structured, and unstructured data ingestion and exchange.
- Experience with open file formats and optimizing pipelines using Parquet, Delta, or Iceberg.
Cloud Knowledge and Experience (25%)
- Hands-on experience with cloud data platforms, repositories, and data lake/lakehouse solutions.
- Expertise in DaaS, DBaaS, DWaaS, and other storage platforms in cloud and on-premise environments.
- Experience managing cloud data services for project delivery, including storage, key vaults, and virtual environments.
Agile Product Development (25%)
- Experience in agile, sprint-based environments.
- Strong understanding of iterative product development cycles.
- Collaboration on shared codebases with Git tools.
- Experience in CI/CD, automation, and defining/executing tests across the development lifecycle.