Enable job alerts via email!

Python Pyspark Data Engineer with AI, Control M

Astra North Infoteck Inc.

Toronto

On-site

CAD 100,000 - 130,000

Full time

20 days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading company in data solutions is seeking a Data Engineer with extensive experience in Python, Spark, and Databricks. The role involves designing scalable data pipelines, managing Spark workloads, and collaborating with various teams to drive data-driven decisions. Ideal candidates will have a strong background in data engineering and automation practices.

Qualifications

8-10 years of experience in data engineering.
Strong experience with Python and Spark (PySpark).
Hands-on with Databricks and SQL for complex data transformations.

Responsibilities

Design, build, and optimize scalable data pipelines.
Deploy and manage large-scale Spark workloads on Databricks.
Collaborate with data scientists and business stakeholders.

Skills

Python

Spark (PySpark)

SQL

Databricks

Machine Learning workflows

Tools

Control-M

Snowflake

Airflow

Keywords : Python and Spark (PySpark),Databricks (Jobs, Workflows, Delta Lake, Unity Catalog,SQL

Role Description :

Design, build, and optimize scalable data pipelines
Develop and operationalize data products across structured and unstructured sources, including alternative data
Deploy, manage, and performance-tune large-scale Spark workloads on Databricks, ensuring reliability, scalability, and cost-efficiency
Collaborate with data scientists, quant teams, and business stakeholders to enable data-driven decision-making
Contribute to automation efforts via CI / CD pipelines, infrastructure-as-code, and reusable data frameworks

Competencies : Python Web Frameworks, Databricks, PySpark, Control-M_Workload Scheduling and Automation_Administration

Experience (Years) : 8-10

Strong experience with Python and Spark (PySpark)
Hands-on with Databricks (Jobs, Workflows, Delta Lake, Unity Catalog)
Proficient in SQL for complex data transformations and optimizations
Solid understanding of distributed data processing and production-grade data workflows
Exposure to Machine Learning workflows and tools like MLflow
Experience working with Alternative Data sources (e.g., web data, geospatial, satellite, social sentiment)
Familiarity with Snowflake, Airflow, or similar orchestration and warehousing platforms
Understanding of CI / CD principles, version control, and production deployment best practices

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.