Aktiviere Job-Benachrichtigungen per E-Mail!

Python Data Platform Engineer

Compunnel, Inc.

Montreal

Vor Ort

CAD 80 000 - 120 000

Vollzeit

Vor 30+ Tagen

Erhöhe deine Chancen auf ein Interview

Erstelle einen auf die Position zugeschnittenen Lebenslauf, um deine Erfolgsquote zu erhöhen.

Zusammenfassung

An established industry player is seeking a Python Data Platform Engineer to join their dynamic team. This role involves building a next-gen data platform that streamlines data sourcing and storage from various technology systems. You will contribute to developing a unified data pipeline framework using Python and key technologies like Airflow, DBT, and Snowflake. Collaborating with cross-functional teams, you will ensure the platform meets the needs of Technology Risk functions, enhancing reporting and analytics capabilities. This is an exciting opportunity to make a significant impact in a complex data environment while leveraging your expertise in data engineering and analytics.

Qualifikationen

  • 7+ years of experience in data development in complex environments.
  • Strong SQL skills with experience in writing complex queries.
  • Experience with data pipelines and warehousing solutions.

Aufgaben

  • Develop components of a unified data pipeline framework in Python.
  • Establish best practices for using Airflow, DBT, and Snowflake.
  • Monitor performance of queries and perform necessary tuning.

Kenntnisse

Python
SQL/PLSQL
Data Pipeline Development
Airflow
Snowflake
Apache Spark
Data Warehousing
Analytical Skills
Problem-Solving Skills
Communication Skills

Ausbildung

Bachelor's degree in Computer Science

Tools

Airflow
DBT
Snowflake
Apache Spark
Pandas
NumPy
PySpark

Jobbeschreibung

Job Duties:
As a Python Data Platform Engineer, you will be a member of the C3 Data Warehouse team within the Controls Engineering, Measurement and Analytics (CEMA) department, with a focus on building our next-gen data platform used for sourcing and storing data from different technology systems across the firm into a centralized data platform that empowers various reporting and analytics solutions for the Technology Risk functions within Morgan Stanley. In this role you will be primarily responsible for contributing to the development of a unified data pipeline framework written in Python utilizing technologies such as Airflow, DBT, Spark and Snowflake. You will also be responsible for contributing to the integration of this framework with existing internal platforms for data quality, data cataloging, data discovery, incident logging, and metric generation. You will be working closely with data warehousing leads, data analysts, ETL developers, infrastructure engineers, and data analytics teams to facilitate the implementation of this data platform and data pipeline framework.

KEY RESPONSIBILITIES:
• To develop various components in Python of our unified data pipeline framework.
• To contribute towards the establishment of best practices for the optimal and efficient usage of Airflow, DBT and Snowflake.
• To assist with the testing and deployment of our data pipeline framework utilizing standard testing frameworks and CI/CD tooling.
• To monitor the performance of queries and data loads and perform tuning as necessary.
• To provide assistance and guidance during the QA & UAT phases to quickly confirm the validity of potential issues and to determine the root cause and best resolution of verified issues.

Minimum Skills Required:
• Bachelor’s degree in Computer Science, Software Engineering, Information Technology, or related field required.
• At least 7 years of experience in data development and solutions in highly complex data environments with large data volumes.
• At least 7 years of SQL / PLSQL experience with the ability to write ad-hoc and complex queries to perform data analysis.
• At least 5 years of experience developing data pipelines and data warehousing solutions using Python and libraries such as Pandas, NumPy, PySpark, etc.
• At least 3 years of experience developing solutions in a hybrid data environment (on-Prem and Cloud)
• At least 3 years of experience developing Airflow DAGs to orchestrate data pipelines that utilize branching, dynamic DAG / task generation, and error handling.
• Hands on experience with developing data pipelines for structured, semi-structured, and unstructured data and experience integrating with their supporting stores (e.g. RDBMS, NoSQL DBs, Document DBs, Log Files, etc.)
• Hands on experience with Snowflake a must.
• Hands on experience with Apache Spark a must.
• Hands on experience with DBT preferred.
• Experience with performance tuning SQL queries, Spark job, and stored procedures.
• An understanding of E-R data models (conceptual, logical, and physical).
• Understanding of advanced data warehouse concepts (Factless Fact Tables, Temporal \ Bi-Temporal models, etc.) a plus.
• Strong analytical skills, including a thorough understanding of how to interpret customer business requirements and translate them into technical designs and solutions.
• Strong communication skills both verbal and written. Capable of collaborating effectively across a variety of IT and Business groups, across regions, roles and able to interact effectively with all levels.
• Self-starter. Proven ability to manage multiple, concurrent projects with minimal supervision. Can manage a complex ever changing priority list and resolve conflicts to competing priorities.
• Strong problem-solving skills. Ability to identify where focus is needed and bring clarity to business objectives, requirements, and priorities.

Hol dir deinen kostenlosen, vertraulichen Lebenslauf-Check.
eine PDF-, DOC-, DOCX-, ODT- oder PAGES-Datei bis zu 5 MB per Drag & Drop ablegen.