Enable job alerts via email!

Data Engineer

Bhasaka Technologies

Pune District, Bengaluru, Hyderabad

Hybrid

INR 15,00,000 - 25,00,000

Full time

Today

Be an early applicant

Job summary

A leading data engineering company in India is looking for a highly experienced Senior Data Engineer to design, build, and maintain scalable data pipelines. The ideal candidate will have strong expertise in PySpark, 5+ years of experience in data engineering, and proficiency in SQL and Python. This role involves collaborating with various teams to ensure data quality and governance.

Qualifications

5+ years of experience in data engineering roles.
Strong hands-on experience with PySpark for data transformation and processing.
Deep understanding of distributed computing concepts, data partitioning, and performance tuning.

Responsibilities

Design, implement, and maintain scalable ETL/ELT pipelines using PySpark.
Collaborate closely with data scientists, analysts, and business stakeholders.
Optimize PySpark jobs for distributed environments.

Skills

PySpark

SQL

Python

Distributed computing concepts

Data engineering

Tools

Apache Spark

Hadoop

Delta Lake

Apache Airflow

Senior Data Engineer --- PySpark

Location: Pune / PAN India
Job Type: Full-time
Experience: 5+ years

Job Summary

We are looking for a highly experienced Senior Data Engineer with strong expertise in PySpark and large-scale data processing to join our data engineering team. In this role, you will be responsible for designing, building, and maintaining advanced data pipelines that support enterprise-level analytics and data science initiatives.

Key Responsibilities

Design, implement, and maintain scalable and efficient ETL/ELT pipelines using PySpark.
Work with big data frameworks such as Apache Spark, Hadoop, Hive, and Delta Lake.
Handle large volumes of structured and unstructured data across various sources (databases, APIs, flat files, etc.).
Develop and optimize complex data transformation workflows and batch/streaming jobs.
Ensure data quality, integrity, and governance throughout the data lifecycle.
Collaborate closely with data scientists, analysts, DevOps, and business stakeholders.
Troubleshoot performance issues and optimize PySpark jobs for distributed environments.
Manage workflows using orchestration tools such as Apache Airflow or similar.
Contribute to the architecture and design of scalable, fault-tolerant data platforms.

Required Skills & Qualifications

5+ years of experience in data engineering roles.
Strong hands-on experience with PySpark for data transformation and processing.
Proficient in SQL and Python.
Deep understanding of distributed computing concepts, data partitioning, and performance tuning.
Experience with cloud platforms such as AWS, Azure, or Google Cloud Platform.
Experience with data lakehouse architectures (e.g., Delta Lake, Databricks).
Knowledge of version control (Git), CI/CD, and agile methodologies.

Nice to Have

Exposure to streaming data technologies like Kafka, Spark Streaming, or Flink.
Experience with Snowflake, Redshift, or BigQuery.
Knowledge of data governance, data lineage, and cataloging tools (e.g., Collibra, Alation).
Familiarity with containerization (Docker, Kubernetes).

Share your resume @ avinash.allure@bhasaka.com

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.