Enable job alerts via email!

Lead Data Engineer - Remote / Telecommute

Cynet Systems Inc

Minneapolis (MN)

Remote

USD 80,000 - 120,000

Full time

Yesterday

Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading company in Minneapolis is seeking a Data Engineer proficient in PySpark and SQL to develop and maintain ETL/ELT pipelines. The role involves working with large-scale datasets and collaborating with cross-functional teams to ensure data quality and performance optimization.

Qualifications

Strong experience in Python for data engineering tasks.
Proficiency in PySpark for large-scale data processing.
Experience with cloud data services and orchestration tools.

Responsibilities

Develop, optimize, and maintain ETL/ELT pipelines using PySpark and SQL.
Collaborate with Data Scientists and Analysts to integrate data workflows.
Ensure data quality, validation, and consistency in data pipelines.

Skills

Python

PySpark

SQL

Data Quality

Data Engineering

Tools

AWS Glue

Databricks

Azure Synapse

GCP BigQuery

Airflow

Apache Oozie

Snowflake

Redshift

Git

Job Description:

Responsibilities:

Develop, optimize, and maintain ETL/ELT pipelines using PySpark and SQL.
Work with structured and unstructured data to build scalable data solutions.
Write efficient and scalable PySpark scripts for data transformation and processing.
Optimize SQL queries, stored procedures, and indexing strategies to enhance performance.
Design and implement data models, schemas, and partitioning strategies for large-scale datasets.
Collaborate with Data Scientists, Analysts, and other Engineers to integrate data workflows.
Ensure data quality, validation, and consistency in data pipelines.
Implement error handling, logging, and monitoring for data pipelines.
Work with cloud platforms (AWS, Azure, or GCP) for data processing and storage.
Optimize data pipelines for cost efficiency and performance.

Technical Skills Required:

Strong experience in Python for data engineering tasks.
Proficiency in PySpark for large-scale data processing.
Deep understanding of SQL (Joins, Window Functions, CTEs, Query Optimization).
Experience in ETL/ELT development using Spark and SQL.
Experience with cloud data services (AWS Glue, Databricks, Azure Synapse, GCP BigQuery).
Familiarity with orchestration tools (Airflow, Apache Oozie).
Experience with data warehousing (Snowflake, Redshift, BigQuery).
Understanding of performance tuning in PySpark and SQL.
Familiarity with version control (Git) and CI/CD pipelines.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.