Enable job alerts via email!

Lead Data Engineer

WorkHQ

Los Angeles (CA)

Remote

USD 140,000 - 180,000

Full time

Yesterday
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

Join a well-funded US startup in HRTech as a Lead Data Engineer. You will manage data infrastructure and lead a team of data engineers. The role involves designing scalable data pipelines and optimizing data processing architectures using cutting-edge technologies. This position offers a competitive salary and the chance to work remotely in a dynamic environment.

Qualifications

  • 5-8 years professional data engineering experience.
  • Proficiency in PySpark, AWS data services, Docker, and data manipulation using Pandas.

Responsibilities

  • Design scalable data pipelines processing massive record volumes.
  • Architect ETL processes using PySpark on Amazon EMR.
  • Integrate new data sources into the main pipeline.

Skills

PySpark
AWS data services
Docker
Python
SQL

Tools

Postgres
OpenSearch
EMR
Pandas
Splink

Job description

Join to apply for the Lead Data Engineer role at WorkHQ.

This range is provided by WorkHQ. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.

Base pay range

$140,000.00/yr - $180,000.00/yr

WorkHQ is an all-in-one recruiting platform that provides: 1. Database of 100M US professionals 2. Email and phone number lookup 3. Email outreach and sequencing 4. Applicant tracking system. Recruiting well can set a company up for long-term success, while poor recruiting can set up a company for failure. We are working on a bold mission to replace the current jumble of multiple expensive and confusing systems into a single, affordable platform.

Company Context

Series A, well-funded US startup in HRTech developing WorkHQ.com and an AI Recruiter product.

Role Overview

Lead data infrastructure architect managing billions of data points across 250M+ professional profiles. Hire data engineers to aid you in that journey.

Core Responsibilities
  • Design scalable data pipelines processing massive record volumes
  • Architect ETL processes using PySpark on Amazon EMR (open to shifting to other solutions like Data Bricks / Snowflake)
  • Distribute enriched data through medallion architecture across Postgres, Athena, OpenSearch
  • Integrate new data sources into the main pipeline
  • Implement advanced data matching using Splink
Technical Requirements
  • 5-8 years professional data engineering experience
  • Proficiency in:
    • PySpark and distributed computing
    • AWS data services (EMR, Glue, Athena)
    • Docker
    • Pandas and DataFrame manipulation
    • Handling complex data formats (JSONL, Parquet)
  • Strong background in:
    • Big data processing architectures
    • Data warehouse design
    • Performance optimization
  • Advanced Python and SQL skills
Nice to Have
  • Probabilistic record linking expertise
  • OpenSearch/elasticsearch technologies
  • Machine learning data pipeline design
  • Recruitment tech ecosystem knowledge
Technical Stack
  • Big Data: PySpark, EMR
  • Databases: Postgres, OpenSearch
  • Cloud: AWS
  • Containerization: Docker
  • Data Formats: JSONL, Parquet
  • Analytics: Metabase, Athena, Glue
  • Data Processing: Pandas, Splink
Other Considerations

While this role has specific requirements, if you lack a few technical skills but are motivated to learn and lead the platform, please apply for consideration.

If you come from Director/Head of/VP levels relevant to this job, you can also apply.

You will need to apply directly on our platform.

Thank you for your time.

The role requires 5-8 years of professional data engineering experience with proficiency in PySpark, AWS data services, Docker, and data manipulation using Pandas. Candidates should have a strong background in big data processing architectures, data warehouse design, and performance optimization, along with advanced skills in Python and SQL.

Join a well-funded US startup in HRTech with the opportunity to lead data infrastructure projects remotely. Work with cutting-edge technologies and a talented team in a dynamic environment.

Seniority level
  • Mid-Senior level
Employment type
  • Full-time
Job function
  • Other
Industries
  • IT Services and IT Consulting

Referrals increase your chances of interviewing at WorkHQ by 2x.

Set job alerts for “Data Engineer” roles by signing in.

Examples of similar roles:

  • Alhambra, CA $110,000.00-$125,000.00 2 weeks ago
  • Full Stack Software Engineer (L5), Content Middleware Infrastructure, United States $70.67-$208,000.00 1 week ago
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Lead Data Engineer

Jobot

Grand Prairie

Remote

USD 150,000 - 190,000

Today
Be an early applicant

Lead Data Engineer - Remote

Jobot

Louisville

Remote

USD 130,000 - 205,000

Today
Be an early applicant

Lead Data Engineer

LGND AI, Inc.

San Francisco

Remote

USD 126,000 - 187,000

2 days ago
Be an early applicant

Lead Data Engineer

LGND AI, Inc.

New York

Remote

USD 130,000 - 160,000

Today
Be an early applicant

Lead Data Engineer - Remote

Jobot

Erie

Remote

USD 130,000 - 205,000

Today
Be an early applicant

Lead Data Engineer - Remote

Jobot

Indianapolis

Remote

USD 130,000 - 205,000

Today
Be an early applicant

Lead Data Engineer - Remote

Jobot

Dundalk

Remote

USD 130,000 - 205,000

Yesterday
Be an early applicant

Lead Data Engineer - Remote

Jobot

Boston

Remote

USD 130,000 - 205,000

Today
Be an early applicant

Lead Data Engineer - Remote

Jobot

Savannah

Remote

USD 130,000 - 205,000

Today
Be an early applicant