Enable job alerts via email!

Lead Data Engineer

WorkHQ

Los Angeles (CA)

Remote

USD 140,000 - 180,000

Full time

Yesterday

Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

Join a well-funded US startup in HRTech as a Lead Data Engineer. You will manage data infrastructure and lead a team of data engineers. The role involves designing scalable data pipelines and optimizing data processing architectures using cutting-edge technologies. This position offers a competitive salary and the chance to work remotely in a dynamic environment.

Qualifications

5-8 years professional data engineering experience.
Proficiency in PySpark, AWS data services, Docker, and data manipulation using Pandas.

Responsibilities

Design scalable data pipelines processing massive record volumes.
Architect ETL processes using PySpark on Amazon EMR.
Integrate new data sources into the main pipeline.

Skills

PySpark

AWS data services

Docker

Python

SQL

Tools

Postgres

OpenSearch

EMR

Pandas

Splink

Join to apply for the Lead Data Engineer role at WorkHQ.

This range is provided by WorkHQ. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.

Base pay range

$140,000.00/yr - $180,000.00/yr

WorkHQ is an all-in-one recruiting platform that provides: 1. Database of 100M US professionals 2. Email and phone number lookup 3. Email outreach and sequencing 4. Applicant tracking system. Recruiting well can set a company up for long-term success, while poor recruiting can set up a company for failure. We are working on a bold mission to replace the current jumble of multiple expensive and confusing systems into a single, affordable platform.

Company Context

Series A, well-funded US startup in HRTech developing WorkHQ.com and an AI Recruiter product.

Role Overview

Lead data infrastructure architect managing billions of data points across 250M+ professional profiles. Hire data engineers to aid you in that journey.

Core Responsibilities

Design scalable data pipelines processing massive record volumes
Architect ETL processes using PySpark on Amazon EMR (open to shifting to other solutions like Data Bricks / Snowflake)
Distribute enriched data through medallion architecture across Postgres, Athena, OpenSearch
Integrate new data sources into the main pipeline
Implement advanced data matching using Splink

Technical Requirements

5-8 years professional data engineering experience
Proficiency in:

PySpark and distributed computing
AWS data services (EMR, Glue, Athena)
Docker
Pandas and DataFrame manipulation
Handling complex data formats (JSONL, Parquet)

Strong background in:

Big data processing architectures
Data warehouse design
Performance optimization

Advanced Python and SQL skills

Nice to Have

Probabilistic record linking expertise
OpenSearch/elasticsearch technologies
Machine learning data pipeline design
Recruitment tech ecosystem knowledge

Technical Stack

Big Data: PySpark, EMR
Databases: Postgres, OpenSearch
Cloud: AWS
Containerization: Docker
Data Formats: JSONL, Parquet
Analytics: Metabase, Athena, Glue
Data Processing: Pandas, Splink

Other Considerations

While this role has specific requirements, if you lack a few technical skills but are motivated to learn and lead the platform, please apply for consideration.

If you come from Director/Head of/VP levels relevant to this job, you can also apply.

You will need to apply directly on our platform.

Thank you for your time.

The role requires 5-8 years of professional data engineering experience with proficiency in PySpark, AWS data services, Docker, and data manipulation using Pandas. Candidates should have a strong background in big data processing architectures, data warehouse design, and performance optimization, along with advanced skills in Python and SQL.

Join a well-funded US startup in HRTech with the opportunity to lead data infrastructure projects remotely. Work with cutting-edge technologies and a talented team in a dynamic environment.

Seniority level

Mid-Senior level

Employment type

Full-time

Job function

Other

Industries

IT Services and IT Consulting

Referrals increase your chances of interviewing at WorkHQ by 2x.

Set job alerts for “Data Engineer” roles by signing in.

Examples of similar roles:

Alhambra, CA $110,000.00-$125,000.00 2 weeks ago
Full Stack Software Engineer (L5), Content Middleware Infrastructure, United States $70.67-$208,000.00 1 week ago

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.