Position Overview
We're seeking a self-sufficient Senior Data Engineer to build and scale our data infrastructure supporting product, engineering and analytics team. You'll architect data pipelines, optimize our data platform, and ensure the teams have reliable, high-quality data to drive business decisions.
This is a hands-on role for someone who can own the entire data engineering stack - from ingestion to transformation to orchestration. You'll work independently to solve complex data challenges and build scalable solutions.
Core Responsibilities
- Data Pipeline Development & Optimization: Design, build, and maintain scalable data pipelines using Spark and Databricks
- Develop ETL/ELT workflows to process large volumes of customer behavior data
- Optimize Spark jobs for performance, cost efficiency, and reliability
- Build real-time and batch data processing solutions
- Implement data quality checks and monitoring throughout pipelines
- Ensure data freshness and SLA compliance for analytics workloads
AWS Data Infrastructure
- Architect and manage data infrastructure on AWS (S3, Glue, EMR, Redshift)
- Design and implement data lake architecture with proper partitioning and optimization
- Configure and optimize AWS Glue for ETL jobs and data cataloging
- Shifting Glue jobs to Zero ETL
- Implement security best practices for data access and governance
- Monitor and optimize cloud costs related to data infrastructure
Data Modeling & Architecture
- Design and implement dimensional data models for analytics
- Build star/snowflake schemas optimized for analytical queries
- Create data marts for specific business domains (retention, campaigns, product)
- Ensure data model scalability and maintainability
- Document data lineage, dependencies, and business logic
- Implement slowly changing dimensions and historical tracking
Orchestration & Automation
- Build and maintain workflow orchestration using Airflow or similar tools
- Implement scheduling, monitoring, and alerting for data pipelines
- Create automated data quality validation frameworks
- Design retry logic and error handling for production pipelines
- Build CI/CD pipelines for data workflows
- Automate infrastructure provisioning using Infrastructure as Code
Cross-Functional Collaboration
- Partner with Senior Data Analyst to understand analytics requirements
- Work with Growth Director and team to enable data-driven decision making
- Support CRM Lead with data needs for campaign execution
- Collaborate with Product and Engineering on event tracking and instrumentation
- Document technical specifications and best practices for the team
- Work closely with all squads, establish data contracts with engineers to land data in a most optimal way
Required Qualifications
- Must-Have Technical Skills
- Apache Spark: Expert-level proficiency in PySpark/Spark SQL for large-scale data processing
- Databricks: Strong hands-on experience building and optimizing pipelines on Databricks platform
- AWS: Deep knowledge of AWS data services (S3, Glue, EMR, Redshift, Athena)
- Data Modeling: Proven experience designing dimensional models and data warehouses
- Orchestration: Strong experience with workflow orchestration tools (Airflow, Prefect, or similar)
- SQL: Advanced SQL skills for complex queries and optimization
- Python: Strong programming skills for data engineering tasks
- Experience
- 6-10 years in data engineering with focus on building scalable data platforms
- Proven track record architecting and implementing data infrastructure from scratch
- Experience processing large volumes of event data (billions of records)
- Background in high-growth tech companies or consumer-facing products
- Experience with mobile/web analytics data preferred
- Technical Requirements
- Expert in Apache Spark (PySpark and Spark SQL) with performance tuning experience
- Deep hands-on experience with Databricks (clusters, jobs, notebooks, Delta Lake)
- Strong AWS expertise: S3, Glue, EMR, Redshift, Athena, Lambda, CloudWatch
- Proficiency with orchestration tools: Airflow, Prefect, Step Functions, or similar
- Advanced data modeling skills: dimensional modeling, normalization, denormalization
- Experience with data formats: Parquet, Avro, ORC, Delta Lake
- Version control with Git and CI/CD practices
- Infrastructure as Code: Terraform, CloudFormation, or similar
- Understanding of data streaming technologies (Kafka, Kinesis) is a plus
- Core Competencies
- Self-sufficient: You figure things out independently without constant guidance
- Problem solver: You diagnose and fix complex data pipeline issues autonomously
- Performance-focused: You optimize for speed, cost, and reliability
- Quality-driven: You build robust, maintainable, and well-documented solutions
- Ownership mindset: You take end-to-end responsibility for your work
- Collaborative: You work well with analysts and business stakeholders despite being independent
- Nice-to-Have
- Databricks certifications (Data Engineer Associate/Professional)
- Experience with dbt for data transformation
- Knowledge of customer data platforms (Segment, mParticle, Rudderstack)
- Experience with event tracking platforms (Mixpanel, Amplitude)
- Familiarity with machine learning infrastructure and MLOps
- Experience in MENA region or emerging markets
- Background in on-demand services, marketplaces, or subscription businesses
- Knowledge of real-time streaming architectures
What We Offer
- Competitive salary based on experience
- Ownership of critical data infrastructure and architecture decisions
- Work with modern data stack and cutting-edge AWS technologies
- Direct impact on business decisions through data platform improvements
- Comprehensive health benefits