Python ETL Developer / Data Engineer - Remote
ipvisibility
Ottawa
Remote
CAD 70,000 - 90,000
Full time
30+ days ago
Job summary
A technology services company in Ottawa seeks a skilled ETL Developer to design and optimize data integration processes. Responsibilities include developing ETL jobs and tuning performance for efficient data analysis. The ideal candidate has experience with Data Lakes and writing queries in Hive or Impala, along with a strong understanding of ETL principles.
Qualifications
- Experience in reviewing and designing ETL jobs.
- Strong knowledge of data integration with business applications.
- Proficiency in writing queries in Hive or Impala.
Responsibilities
- Design and develop ETL processes for data integration.
- Optimize performance of ETL jobs and queries.
- Parse and process large historical XML data files.
Skills
ETL development
Data Lake
Hive
Python
Performance tuning
Overview
Job Description
Specific Duties
- Reviewing, designing, developing ETL jobs to ingest data into Data Lake, load data to data marts;
- extract data to integrate with various business applications.
- Parse unstructured data, semi structured data such XML etc.
- Design and develop efficient Mapping and workflows to load data to Data Marts
- Map XML DTD schema in Python (customized table definitions)
- Write efficient queries and reports in Hive or Impala to extract data on ad hoc basis for data analysis.
- Identify the performance bottlenecks in ETL Jobs and tune their performance by enhancing or redesigning them.
- Responsible for performance tuning of ETL mappings and queries.
- import tables and all necessary lookup tables to facilitate the ETL process required to process daily XML files in addition to processing the very large (multi-terabytes) historical XML data files