Job Overview
As Data Engineer, you willsupport Data Engineering team in setting up the Data Lake on Cloud and theimplementation of standardized Data Model, single view of customer.
You will develop data pipelinesfor new sources, data transformations within the Data Lake, implementingGRAPHQL, work on NO SQL Database, CI/CD and data delivery as per the businessrequirements.
Responsibilities
- Buildpipelines to bring in wide variety of data from multiple sources withinthe organization as well as from social media and public data sources.
- Collaboratewith cross functional teams to source data and make it available fordownstream consumption.
- Workwith the team to provide an effective solution design to meet businessneeds.
- Ensureregular communication with key stakeholders, understand any key concernsin how the initiative is being delivered or any risks/issues that haveeither not yet been identified or are not being progressed.
- Ensuredependencies and challenges (risks) are escalated and managed. Escalatecritical issues to the Sponsor and/or Head of Data Engineering team.
- Ensuretimelines (milestones, decisions and delivery) are managed and achieved,without compromising quality and within budget.
- Ensurean appropriate and coordinated communications plan is in place forinitiative execution and delivery, both internal and external.
- Ensurefinal handover of initiative to business-as-usual processes, carry out apost implementation review (as necessary) to ensure initiative objectiveshave been delivered, and any lessons learnt are included in futureprocesses.
Qualifications
Who we are looking for:
Competencies & PersonalTraits
- Expertise in Databricks
- Experience with at least one Cloud Infra provider (Azure/AWS)
- Experiencein building data pipelines using batch processing with Apache Spark (SparkSQL, Dataframe API) or Hive query language (HQL)
- Experiencein building streaming data pipeline using Apache Spark StructuredStreaming or Apache Flink on Kafka & Data Lake
- Knowledge of NOSQL databases.
- Expertise in Cosmos DB, Restful APIs and GraphQL
- Knowledge of Big data ETL processing tools, Datamodelling and Data mapping.
- ExperiencewithHive and Hadoop file formats (Avro / Parquet / ORC)
- Basicknowledge of scripting (shell / bash)
- Experienceof working with multiple data sources including relational databases (SQLServer / Oracle / DB2 / Netezza), NoSQL / document databases, flat files
- Experiencewith CI CD tools such as Jenkins, JIRA, Bitbucket, Artifactory, Bamboo andAzure Dev-ops.
- Basicunderstanding of DevOps practices using Git version control
- Abilityto debug, fine tune and optimize large scale data processing jobs
- Excellentproblem analysis skills
Experience
- 5+years (no upper limit) of experience working with Enterprise ITapplications in cloud platform and big data environments.
Professional Qualifications
Certifications related to Dataand Analytics would be an added advantage