Experience (Years) : 8-10
- Strong Python programming experience (3+ years)
- Advanced proficiency with pandas for data manipulation and analysis
- Experience designing and maintaining production ETL workflows
- Solid understanding of data modeling concepts and techniques
- Familiarity with SQL and relational database concepts
- Version control systems (Git) and collaborative development practices
- Problem-solving mindset with attention to detail? Strong communication skills to explain technical concepts clearly
- Commitment to code quality and testing practices
- Ability to balance competing priorities in a fast-paced environment
- Experience with DevOps such as Git, Jenkins, Jira, Confluence, and other similar technologies.
- Familiarity with Messaging Queue systems such as Solace.Experience working with Cloud technologies and developing applications for cloud deployment via Kubernetes / Docker. Familia
- Experience with Polars for high-performance data processing
- Familiarity with DuckDB for analytical query processing
- Knowledge of cloud-based data platforms (AWS, GCP, or Azure)
- Experience with orchestration tools like Airflow, Luigi, or Dagster
- Understanding of data governance and compliance requirements
- Contributions to open-source data projectsr with micro-services architecture and API management
- Experience building applications delivered and executed in cloud (e.g. AWS, GCP, etc.)
Role Description :
- Architect and build efficient ETL pipelines in Python to process diverse data sources
- Implement data transformation logic using pandas for complex manipulations and aggregations
- Document pipeline architecture, data flows, and transformation processes
- Collaborate with data scientists and business stakeholders to understand data requirements
- Optimize existing pipelines for performance and maintainability
- Implement data quality checks and monitoring solutions