Enable job alerts via email!

Pyspark_Chennai, Hyderabad

Tata Consultancy Services

Hyderabad, Chennai District

On-site

INR 4,50,000 - 6,75,000

Full time

Today

Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A prominent technology consulting firm is seeking a skilled Data Engineer to develop and maintain scalable data pipelines using Apache Spark. The role demands hands-on experience with Python, PySpark, and Scala programming, focusing on building robust ETL/ELT pipelines in a cloud environment. Collaboration with data scientists and stakeholders is essential to ensure the delivery of high-quality data solutions and optimization of workflows. The ideal candidate will have a deep understanding of data lakes and functional programming concepts.

Qualifications

2+ years of hands-on experience in Apache Spark.
Strong programming skills in PySpark, Python, and Scala.
Deep understanding of functional programming concepts.

Responsibilities

Develop and maintain scalable data pipelines using Apache Spark.
Build ETL/ELT pipelines on AWS/GCP/Azure.
Collaborate with data scientists and business stakeholders.

Skills

Apache Spark

Python

PySpark

Scala

SQL

Cloud services (S3, Glue, Lambda, EMR)

Tools

Databricks

Jenkins

GitHub Actions

56 years of total experience in data engineering or big data development.

2–3 years hands-on experience with Apache Spark.
Strong programming skills in PySpark, Python, and Scala.
2+ years of experience in Scala backend development.
Proficient in Scala, both object oriented and functional programming concepts.
Deep understanding and application of advanced functional programming concepts like category theory, monads, applicatives, and type classes.
Hands-On knowledge with Scala Typelevel libraries like Cats, Shapeless, and others used for building applications with strong typing and efficient concurrency.
Solid understanding of data lakes, lakehouses, and Delta Lake concepts.
Experience in SQL development and performance tuning.

Proficient in cloud services (e.g. S3, Glue, Lambda, EMR, Redshift, CloudWatch, IAM).

Familiarity with Airflow, dbt, or similar orchestration tools is a plus.
Experience in CI/CD tools like Jenkins, GitHub Actions, or Code Pipeline.
Knowledge of data security, governance, and compliance frameworks.

Responsibilities

Develop and maintain scalable data pipelines using Apache Spark on Databricks.

Build end-to-end ETL/ELT pipelines on AWS/GCP/Azure using services like S3, Glue, Lambda, EMR, and Step Functions.
Collaborate with data scientists, analysts, and business stakeholders to deliver high-quality data solutions.
Design and implement data models, schemas, and Lakehouse architecture in Databrick/Snowflake.
Optimize and tune Spark jobs for performance and cost-efficiency.
Integrate data from multiple structured and unstructured data sources.
Monitor and manage data workflows, ensuring data quality, consistency, and.
Follow best practices in CI/CD, code versioning (Git), and DevOps practices for data applications.
Write clean, reusable, well-documented code using Python / PySpark / Scala.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.