Roles and Responsibilities
- Data Pipeline Development: Design, build, and maintain robust, scalable data pipelines using tools like Apache Airflow, Kafka, and Spark.
- Data Architecture: Develop and optimize data models, data warehouses, and ETL processes to support analytics and machine learning initiatives.
- Collaboration: Work closely with data scientists, analysts, and software engineers to understand data requirements and deliver tailored solutions.
- Data Quality & Governance: Ensure data integrity, implement validation checks, and enforce data governance policies.
- Performance Optimization: Monitor and improve data processing speed, system uptime, and query performance.
- Leadership & Mentorship: Guide junior engineers, promote best practices, and foster a culture of continuous learning.
- Innovation & Research: Stay updated with emerging technologies in AI, automation, and data streaming; lead initiatives to integrate them into existing systems.
Desired Skills and Experience
- Bachelor's or master's degree in computer science, data science, information technology, or a related field, with 6+ years of experience.
- Proficiency in SQL, Python, and ETL/ELT tools
- Experience in Microsoft solutions are as follows:
- o Azure Data Factory
- o Microsoft Fabric
- o Azure Synapse Analytics
- o Azure Databricks
- o Power BI
- o Azure SQL Database / SQL Server
- o Azure Purview
- Experience with big data technologies (Hadoop, Spark)
- Familiarity with cloud platforms (Azure, AWS, and GCP)
- Strong understanding of data modelling, data warehousing, and stream processing
- Knowledge of data security, compliance, and governance
- Knowledge of CI/CD pipelines using Azure DevOps
- Understanding of data security and compliance (e.g., GDPR, HIPAA)