Plan & execute the migration to a company wide data lake together with the direct manager and architecture team
Gradually develop & deploy ETL pipelines based on business requirements for alerting, reporting and insights
Ensure data integrity and versioning across all data storage systems
Work closely with the AI & ML team to provide curated datasets ready for encoding and analysis
Perform maintenance for the existing tools (Elastic Search) and fix any outstanding bugs
Follow internal agile/ SCRUM company practices to organize work
Must have:
A minimum of 1 to 4 years experience working with Scala/Java/Python (or any other popular language) in a production environment (all levels considered)
Previous experience working with large datasets and setting up ETL pipelines
Familiarity with Linux based OS and experience with version control systems (Git)
Familiarity with both SQL & No SQL data structures
Passion for working with data
Python, Apache Spark, Hive, PySpark, the PyData stack, SQL, Airflow, Databricks, Snowflake, dbt, Kafka, our own open-source data pipelining framework called Kedro, Dask/RAPIDS, container technologies such as Docker and Kubernetes
Strong knowledge of programming paradigms and architectural concepts including object-oriented and functional programming, microservices, OLTP, OLAP, Lakehouse, Data Mesh
Well-versed in SDLC, applying software engineering best practices to drive enterprise-wide improvement including DevOps, DataOps, and MLOps
Ability to write clean, maintainable, scalable and robust code in a common language (e.g., Python, Scala, Java)
Proven experience with distributed computing frameworks (e.g., Spark, Dask), cloud platforms (e.g., AWS, Azure, GCP), containerization, and analytics libraries (e.g. pandas, numpy, matplotlib); ability to troubleshoot distributed jobs, diagnose issues and implement solutions
Familiarity with orchestration frameworks and tools (e.g., Airflow, Luigi, Azure Data Factory, AWS Step Functions, Databricks Job, Oozie)
Ability to scope projects and define technical workstreams with clear milestones and deliverables
Knowledge of leading analytics software in market e.g. Power BI, Qlik, Alteryx, Talend, etc.