56 years of total experience in data engineering or big data development.
- 2–3 years hands-on experience with Apache Spark.
- Strong programming skills in PySpark, Python, and Scala.
- 2+ years of experience in Scala backend development.
- Proficient in Scala, both object oriented and functional programming concepts.
- Deep understanding and application of advanced functional programming concepts like category theory, monads, applicatives, and type classes.
- Hands-On knowledge with Scala Typelevel libraries like Cats, Shapeless, and others used for building applications with strong typing and efficient concurrency.
- Solid understanding of data lakes, lakehouses, and Delta Lake concepts.
- Experience in SQL development and performance tuning.
Proficient in cloud services (e.g. S3, Glue, Lambda, EMR, Redshift, CloudWatch, IAM).
- Familiarity with Airflow, dbt, or similar orchestration tools is a plus.
- Experience in CI/CD tools like Jenkins, GitHub Actions, or Code Pipeline.
- Knowledge of data security, governance, and compliance frameworks.
Responsibilities
Develop and maintain scalable data pipelines using Apache Spark on Databricks.
- Build end-to-end ETL/ELT pipelines on AWS/GCP/Azure using services like S3, Glue, Lambda, EMR, and Step Functions.
- Collaborate with data scientists, analysts, and business stakeholders to deliver high-quality data solutions.
- Design and implement data models, schemas, and Lakehouse architecture in Databrick/Snowflake.
- Optimize and tune Spark jobs for performance and cost-efficiency.
- Integrate data from multiple structured and unstructured data sources.
- Monitor and manage data workflows, ensuring data quality, consistency, and.
- Follow best practices in CI/CD, code versioning (Git), and DevOps practices for data applications.
- Write clean, reusable, well-documented code using Python / PySpark / Scala.