Enable job alerts via email!

Lead Data Engineer

Randstad (Schweiz) AG

Los Angeles (CA)

Remote

USD 120,000 - 180,000

Full time

29 days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An innovative HRTech startup is seeking a Lead Data Infrastructure Architect to manage vast data points across millions of professional profiles. This remote role involves designing scalable data pipelines, architecting ETL processes with PySpark, and integrating diverse data sources into a robust infrastructure. Ideal candidates will possess extensive experience in data engineering, proficiency in AWS services, and a strong grasp of big data processing architectures. Join a forward-thinking company that values motivation and leadership, even if you lack some technical skills. If you're ready to take on a pivotal role in shaping data solutions, this opportunity is for you!

Qualifications

  • 5-8 Jahre Erfahrung in der Datenverarbeitung und -architektur.
  • Starke Kenntnisse in AWS-Diensten und Docker erforderlich.

Responsibilities

  • Entwerfen Sie skalierbare Datenpipelines zur Verarbeitung großer Datenmengen.
  • Architektur von ETL-Prozessen unter Verwendung von PySpark auf Amazon EMR.

Skills

PySpark
AWS data services
Docker
Advanced Python
SQL
DataFrame manipulation
Complex data format handling
Performance optimization

Tools

Amazon EMR
Postgres
OpenSearch
Athena
Splink
Metabase

Job description

Company Context

Series A, well-funded US startup in HRTech developing WorkHQ.com and an AI Recruiter product.

This is a US-only, Remote role (Mainland).

Role Overview

Lead data infrastructure architect managing billions of data points across 250M+ professional profiles.

Hire data engineers to aid you in that journey.

Core Responsibilities
  • Design scalable data pipelines processing massive record volumes
  • Architect ETL processes using PySpark on Amazon EMR (Open to shifting to other solutions like Data Bricks / Snowflake)
  • Distribute enriched data through medallion architecture across Postgres, Athena, OpenSearch
  • Integrate new data sources into the main pipeline
  • Implement advanced data matching using Splink
Technical Requirements
  • 5-8 years professional data engineering experience
  • Good proficiency in:
    • PySpark and distributed computing
    • AWS data services (EMR, Glue, Athena)
    • Docker
    • Pandas and DataFrame manipulation
    • Complex data format handling (JSONL, Parquet)
  • Strong background in:
    • Big data processing architectures
    • Data warehouse design
    • Performance optimization
  • Advanced Python, SQL skills
Nice to Have
  • Probabilistic record linking expertise
  • OpenSearch/elasticsearch technologies
  • Machine learning data pipeline design
  • Recruitment tech ecosystem knowledge
Technical Stack
  • Big Data: PySpark, EMR
  • Databases: Postgres, OpenSearch
  • Cloud: AWS
  • Containerization: Docker
  • Data Formats: JSONL, Parquet
  • Analytics: Metabase, Athena, Glue
  • Data Processing: Pandas, Splink
Other Considerations

While this role has specific requirements - if you lack a few technical skills, but are motivated to learn and lead the platform, please apply for consideration.

If you are coming from Director/Head of/VP levels that are relevant to this job, you can apply as well.

You will need to apply directly on our platform.

Thank you for your time.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Lead Data Engineer

RightClick

Remote

USD 90,000 - 150,000

2 days ago
Be an early applicant

Principal Data Engineer

Careabout

New York

Remote

USD 90,000 - 150,000

Yesterday
Be an early applicant

Principal Data Engineer - Remote US

ZipRecruiter

Columbus

Remote

USD 90,000 - 150,000

2 days ago
Be an early applicant

Lead Data Engineer

American Red Cross

Charlotte

Remote

USD 140,000 - 150,000

5 days ago
Be an early applicant

Lead Data Engineer - Capital One Software (Remote)

Capital One

Remote

USD 175,000 - 201,000

4 days ago
Be an early applicant

Lead Data Engineer - Capital One Software (Remote)

Capital One

Virginia

Remote

USD 175,000 - 201,000

6 days ago
Be an early applicant

Lead Data Engineer

Humana

Remote

USD 129,000 - 178,000

5 days ago
Be an early applicant

Lead Data Engineer Arlington, VA

540.co

Mississippi

Remote

USD 90,000 - 130,000

3 days ago
Be an early applicant

Lead Data Engineer (Remote)

Inspira Financial Trust, LLC in

Oak Brook

Remote

USD 90,000 - 140,000

6 days ago
Be an early applicant