Responsibilities :
- Assembling large, complex datasets that meet both non-functional and functional business requirements.
- Designing, developing, monitoring, and maintaining scalable data pipelines and ETL processes.
- Building infrastructure for optimal extraction, transformation, and loading of data from various sources using integration and SQL technologies, often cloud-based.
- Identifying, designing, and implementing internal process improvements, including infrastructure redesign for scalability, optimizing data delivery, and automating manual processes.
- Building analytical tools to utilize data pipelines, providing actionable insights into key business metrics.
- Ensuring data quality, consistency, integrity, and security across systems.
- Driving continuous improvement of data engineering practices and tooling.
Required Skills and Experience :
- Bachelor's Degree in Computer Science, Engineering, Mathematics, or related field.
- 5-7 years of experience in database management, data engineering, or similar roles.
- Proficiency in programming languages such as Python or Scala.
- Strong proficiency in SQL and experience with relational databases (e.g., MSSQL, PostgreSQL, MySQL).
- Hands-on experience with No-SQL database technologies.
- Experience in database optimization and performance tuning.
- Good understanding of data integration patterns.
- Exposure to BI tools such as Power BI or Yellowfin is advantageous.
- Experience setting up MS SQL replication and data archiving strategies.
- Experience with cloud platforms (AWS, GCP, Azure) and services like S3, Lambda, Redshift, BigQuery, or Snowflake.
- Familiarity with big data technologies like Apache Spark, Data Bricks, and Hive.
- Knowledge of data modeling, warehousing concepts, and data governance.
- Exposure to data cleansing and de-duplication techniques is beneficial.
Advantageous :
- Experience with stream processing tools (Kafka, Spark Streaming, Flink).
- Knowledge of containerization (Docker) and orchestration tools (Kubernetes).
- Understanding of CI/CD principles and infrastructure-as-code.
- Exposure to machine learning workflows and MLOps.