Job Posting: Spark Developer (Data Modernization Project)
About the Role:
We are seeking multiple skilled and motivated Spark Developers to join a dynamic data engineering team. You will be a key contributor to a strategic, large-scale data modernization initiative for a leading global financial institution. This project involves the refactoring, upgrading, and deployment of a significant portfolio of PySpark scripts to modernize a critical data platform.
This is a fantastic opportunity to work on a high-impact project, enhance your skills with the latest Spark technologies, and gain invaluable experience in the financial services domain.
Key Responsibilities:
- Refactor and upgrade legacy PySpark scripts to be modular, reusable, and compliant with Spark 3.3+ and Python 3.10+.
- Optimize Spark jobs for high performance using techniques like broadcast joins, effective partitioning, and predicate pushdown.
- Replace deprecated APIs (e.g., RDDs, legacy UDFs) with optimized DataFrame and Pandas UDF implementations.
- Implement robust code with structured logging, comprehensive error handling, and alerting mechanisms.
- Ensure data quality and integrity through schema enforcement, consistent data typing, and correct SCD (Slowly Changing Dimensions) logic.
- Collaborate within an Agile team, participating in code reviews, sprint planning, and daily stand-ups.
- Support the integration of code into CI/CD pipelines and contribute to automated testing frameworks.
Qualifications and Experience:
- Education: Bachelor's or Master's degree in Software Engineering, IT, Computer Science, or a related field.
- Experience: 3 to 5 years of hands‑on experience in PySpark development.
Mandatory Technical Skills:
- PySpark Development: 3-5 years of proven experience in refactoring and developing efficient PySpark scripts using DataFrame APIs.
- Spark Optimization: 2-3 years of practical experience in performance tuning (e.g., broadcast joins, partitioning strategies, predicate pushdown).
- PySpark Migration: Hands‑on experience with PySpark migration or modernization projects.
- Banking & Financial Data Models: Understanding of financial data concepts, including SCD logic, surrogate keys, and schema evolution.
Good‑to‑Have Skills:
- Testing Frameworks (e.g., Pytest, Great Expectations).
- Data Governance & Compliance (e.g., PII/PHI handling, data lineage).
- Operational Readiness (e.g., backfill support, idempotent writes).