Enable job alerts via email!

Lead Data Engineer

WorkHQ

Los Angeles (CA)

Remote

USD 120,000 - 160,000

Full time

Today

Be an early applicant

Job summary

A well-funded HRTech startup is seeking a Lead Data Infrastructure Architect to manage data infrastructure handling billions of data points across more than 250 million professional profiles. The role involves designing scalable data pipelines, architecting ETL processes using PySpark, and integrating new data sources into the existing setup. Ideal candidates will have significant experience in data engineering, proficiency in AWS services, and strong Python and SQL skills.

Qualifications

5-8 years of professional data engineering experience.
Strong background in big data processing architectures.
Experience in data warehouse design and performance optimization.

Responsibilities

Design scalable data pipelines processing massive record volumes.
Architect ETL processes using PySpark on Amazon EMR.
Integrate new data sources into the main pipeline.

Skills

PySpark and distributed computing

AWS data services (EMR, Glue, Athena)

Docker

Advanced Python

SQL skills

Tools

Postgres

OpenSearch

DataFrame manipulation with Pandas

Company Context

Series A, well-funded US startup in HRTech developing WorkHQ.com and an AI Recruiter product.

This is a US-only, Remote role (Mainland).

Role Overview

Lead data infrastructure architect managing billions of data points across 250M+ professional profiles.

Hire data engineers to aid you in that journey.

Core Responsibilities

Design scalable data pipelines processing massive record volumes
Architect ETL processes using PySpark on Amazon EMR (Open to shifting to other solutions like Data Bricks / Snowflake)
Distribute enriched data through medallion architecture across Postgres, Athena, OpenSearch
Integrate new data sources into the main pipeline
Implement advanced data matching using Splink

Technical Requirements

5-8 years professional data engineering experience
Good proficiency in:
- PySpark and distributed computing
- AWS data services (EMR, Glue, Athena)
- Docker
- Pandas and DataFrame manipulation
- Complex data format handling (JSONL, Parquet)
Strong background in:
- Big data processing architectures
- Data warehouse design
- Performance optimization
Advanced Python, SQL skills

Nice to Have

Probabilistic record linking expertise
OpenSearch/elasticsearch technologies
Machine learning data pipeline design
Recruitment tech ecosystem knowledge

Technical Stack

Big Data: PySpark, EMR
Databases: Postgres, OpenSearch
Cloud: AWS
Containerization: Docker
Data Formats: JSONL, Parquet
Analytics: Metabase, Athena, Glue
Data Processing: Pandas, Splink

Other Considerations

While this role has specific requirements - if you lack a few technical skills, but motivated to learn and lead the platform, please apply for consideration.

If you are coming from Director/Head of/VP levels that is relevant to this job, you can apply as well.

You will need to apply directly on our platform.

Thank you for your time.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Lead Data Engineer

WorkHQ

Los Angeles (CA)

Remote

USD 120,000 - 160,000

Full time

Job summary

Qualifications

Responsibilities

Skills

Tools

Company

Services

Free resources

Support

Lead Data Engineer

WorkHQ

Los Angeles (CA)

Remote

USD 120,000 - 160,000

Full time

Job summary

Qualifications

Responsibilities

Skills

Tools

Follow us

Company

Services

Free resources

Support