Overview
We are seeking a highly skilled Spark Developer with strong experience in Python, AWS, and SQL to join our team. The ideal candidate will be responsible for designing, developing, and optimizing large-scale data processing solutions, ensuring data quality, scalability, and performance. This role requires a solid background in distributed computing, cloud environments, and data engineering best practices.
Compensation for NYC: 120-150000 USD Gross per year and based on interview results.
Responsibilities
- Design, develop, and maintain scalable data pipelines using Apache Spark (batch and/or streaming).
- Build, optimize, and manage ETL/ELT workflows integrating multiple data sources.
- Develop data solutions in Python for data transformations, automation, and orchestration.
- Leverage AWS services (S3, EMR, Glue, Lambda, Redshift, Kinesis, etc.) to implement cloud-native data platforms.
- Write efficient SQL queries for data extraction, transformation, and reporting.
- Ensure data quality, lineage, and governance across pipelines.
- Collaborate with data engineers, architects, and analysts to deliver end-to-end data solutions.
- Troubleshoot performance bottlenecks and optimize Spark jobs for speed and cost-efficiency.
Qualifications
Must have
- 8+ years of experience in data engineering or backend development.
- Hands-on experience with Apache Spark (PySpark) in large-scale data environments.
- Strong proficiency in Python programming.
- Expertise in SQL (including advanced queries, performance tuning, and optimization).
- Experience working with AWS services such as S3, Glue, EMR, Lambda, Redshift, or Kinesis.
- Understanding of data warehousing concepts and ETL best practices.
- Strong problem-solving skills and ability to work in an agile, collaborative environment.
Nice to have
- Experience with Databricks or similar Spark-based platforms.
- Knowledge of streaming frameworks (Kafka, Flink).
- Familiarity with CI/CD pipelines, Docker, Kubernetes, Terraform.
- Exposure to data modeling (star schema, snowflake, data vault).
- Experience in financial services / capital markets.