Enable job alerts via email!

Data Engineer

Bhasaka Technologies

Pune District, Bengaluru, Hyderabad

Hybrid

INR 15,00,000 - 25,00,000

Full time

Today
Be an early applicant

Job summary

A leading data engineering company in India is looking for a highly experienced Senior Data Engineer to design, build, and maintain scalable data pipelines. The ideal candidate will have strong expertise in PySpark, 5+ years of experience in data engineering, and proficiency in SQL and Python. This role involves collaborating with various teams to ensure data quality and governance.

Qualifications

  • 5+ years of experience in data engineering roles.
  • Strong hands-on experience with PySpark for data transformation and processing.
  • Deep understanding of distributed computing concepts, data partitioning, and performance tuning.

Responsibilities

  • Design, implement, and maintain scalable ETL/ELT pipelines using PySpark.
  • Collaborate closely with data scientists, analysts, and business stakeholders.
  • Optimize PySpark jobs for distributed environments.

Skills

PySpark
SQL
Python
Distributed computing concepts
Data engineering

Tools

Apache Spark
Hadoop
Delta Lake
Apache Airflow
Job description
Senior Data Engineer --- PySpark

Location: Pune / PAN India
Job Type: Full-time
Experience: 5+ years

Job Summary

We are looking for a highly experienced Senior Data Engineer with strong expertise in PySpark and large-scale data processing to join our data engineering team. In this role, you will be responsible for designing, building, and maintaining advanced data pipelines that support enterprise-level analytics and data science initiatives.

Key Responsibilities
  • Design, implement, and maintain scalable and efficient ETL/ELT pipelines using PySpark.
  • Work with big data frameworks such as Apache Spark, Hadoop, Hive, and Delta Lake.
  • Handle large volumes of structured and unstructured data across various sources (databases, APIs, flat files, etc.).
  • Develop and optimize complex data transformation workflows and batch/streaming jobs.
  • Ensure data quality, integrity, and governance throughout the data lifecycle.
  • Collaborate closely with data scientists, analysts, DevOps, and business stakeholders.
  • Troubleshoot performance issues and optimize PySpark jobs for distributed environments.
  • Manage workflows using orchestration tools such as Apache Airflow or similar.
  • Contribute to the architecture and design of scalable, fault-tolerant data platforms.
Required Skills & Qualifications
  • 5+ years of experience in data engineering roles.
  • Strong hands-on experience with PySpark for data transformation and processing.
  • Proficient in SQL and Python.
  • Deep understanding of distributed computing concepts, data partitioning, and performance tuning.
  • Experience with cloud platforms such as AWS, Azure, or Google Cloud Platform.
  • Experience with data lakehouse architectures (e.g., Delta Lake, Databricks).
  • Knowledge of version control (Git), CI/CD, and agile methodologies.
Nice to Have
  • Exposure to streaming data technologies like Kafka, Spark Streaming, or Flink.
  • Experience with Snowflake, Redshift, or BigQuery.
  • Knowledge of data governance, data lineage, and cataloging tools (e.g., Collibra, Alation).
  • Familiarity with containerization (Docker, Kubernetes).

Share your resume @ avinash.allure@bhasaka.com

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.