Enable job alerts via email!

Lead Data Engineer

WorkHQ

Los Angeles (CA)

Remote

USD 140,000 - 180,000

Full time

Yesterday
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading company is seeking a Lead Data Engineer to manage data infrastructure across millions of profiles. The role involves designing scalable data pipelines, integrating new data sources, and implementing advanced data matching. This remote position offers a competitive salary based on experience and skills.

Qualifications

  • 5-8 years professional data engineering experience.
  • Strong background in big data processing architectures.

Responsibilities

  • Design scalable data pipelines processing massive record volumes.
  • Architect ETL processes using PySpark on Amazon EMR.

Skills

PySpark
AWS
Docker
Python
SQL

Tools

Postgres
OpenSearch
EMR
Glue
Pandas

Job description

Join to apply for the Lead Data Engineer role at WorkHQ.

Role Overview

Lead data infrastructure architect managing billions of data points across 250M+ professional profiles. Hire data engineers to aid you in that journey.

Core Responsibilities
  1. Design scalable data pipelines processing massive record volumes
  2. Architect ETL processes using PySpark on Amazon EMR (open to solutions like Data Bricks / Snowflake)
  3. Distribute enriched data through medallion architecture across Postgres, Athena, OpenSearch
  4. Integrate new data sources into the main pipeline
  5. Implement advanced data matching using Splink
Technical Requirements
  • 5-8 years professional data engineering experience
  • Proficiency in:
    • PySpark and distributed computing
    • AWS data services (EMR, Glue, Athena)
    • Docker
    • Pandas and DataFrame manipulation
    • Handling complex data formats (JSONL, Parquet)
  • Strong background in:
    • Big data processing architectures
    • Data warehouse design
    • Performance optimization
  • Advanced Python and SQL skills
Nice to Have
  • Probabilistic record linking expertise
  • OpenSearch/Elasticsearch technologies
  • Machine learning data pipeline design
  • Knowledge of recruitment tech ecosystem
Technical Stack
  • Big Data: PySpark, EMR
  • Databases: Postgres, OpenSearch
  • Cloud: AWS
  • Containerization: Docker
  • Data Formats: JSONL, Parquet
  • Analytics: Metabase, Athena, Glue
  • Data Processing: Pandas, Splink
Additional Notes

If you lack some technical skills but are motivated to learn and lead, please apply for consideration. Candidates from Director/Head/VP levels relevant to this role are encouraged to apply. You will need to apply directly on our platform.

This is a remote role within the US, with a salary range of $140,000 to $180,000 per year, depending on skills and experience.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Lead Data Engineer

LGND AI, Inc.

San Francisco

Remote

USD 126,000 - 187,000

Today
Be an early applicant

Lead Data Engineer - Remote

Jobot

City of Syracuse

Remote

USD 130,000 - 205,000

Today
Be an early applicant

Lead Data Engineer - Remote

Jobot

Pittsburgh

Remote

USD 130,000 - 205,000

Today
Be an early applicant

Lead Data Engineer - Remote

Jobot

Towson

Remote

USD 130,000 - 205,000

Today
Be an early applicant

Lead Data Engineer - Remote

Jobot

Philadelphia

Remote

USD 130,000 - 205,000

Yesterday
Be an early applicant

Lead Data Engineer - Remote

Jobot

Manchester

Remote

USD 130,000 - 205,000

Yesterday
Be an early applicant

Lead Data Engineer - Remote

Jobot

Wilmington

Remote

USD 130,000 - 205,000

Today
Be an early applicant

Lead Data Engineer - Remote

Jobot

Scranton

Remote

USD 130,000 - 205,000

Today
Be an early applicant

Lead Data Engineer - Big Data | Carson City, NV, USA | Remote

S&P Global

Carson City

Remote

USD 118,000 - 238,000

Today
Be an early applicant