Enable job alerts via email!

Lead Data Engineer

ZipRecruiter

San Francisco (CA)

Remote

USD 130,000 - 170,000

Full time

4 days ago
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading startup in geospatial AI is seeking a Lead Data Engineer to enhance their inference pipeline and contribute to innovative data solutions. This key role involves integrating with web applications and collaborating across teams to manage complex datasets. Applicants should have extensive experience along with strong technical skills in Python and data infrastructure. You will play a crucial part in shaping the company's technology foundation, while enjoying competitive benefits including flexible working arrangements.

Benefits

Competitive salary
Equity
Flexible work
Opportunity to shape tech foundation

Qualifications

  • 10+ years in production data pipelines.
  • Experience with satellite image formats like STAC, GeoTIFF, Zarr.
  • Background in MLOps and geospatial AI.

Responsibilities

  • Build and scale the inference pipeline for geospatial embeddings.
  • Collaborate with front-end engineers and AWS for integrations.
  • Design and optimize scalable data processes.

Skills

Proficiency in Python
Docker
Production data pipelines
Geospatial libraries
PyTorch
AWS infrastructure
Database tools

Job description

Job Description

About LGND
LGND is an early-stage startup revolutionizing geospatial AI infrastructure. We bridge the gap between large Earth observation models and specific application developers, enabling intuitive interaction with geospatial data. Our core mission is to empower decision-makers with rapid insights from vast, complex datasets. As part of our small, dynamic team, you will play a foundational role in building innovative tools.

Role Summary

We are seeking a Lead Data Engineer to design, build, and scale our inference pipeline for geospatial embeddings. This pipeline is central to LGND's product, integrating with a web application to generate embeddings for geographic areas based on user parameters. These embeddings will populate a custom, large-scale vector database.

The ideal candidate is experienced in production data pipelines, adaptable, and collaborative. AI and geospatial experience are not mandatory but preferred. The role will evolve into an engineering lead position overseeing all technological components.

This role is remote, with team members in San Francisco, Philadelphia, and Copenhagen.

Key Responsibilities

  • Build the Inference Pipeline:
  • Develop a scalable pipeline for geospatial embeddings based on user input, supporting parameters like geographic area, model type, and imagery source.
  • Balance pre-processed tokens with on-the-fly inference for performance.
  • Ensure the pipeline supports billions of embeddings, leveraging cloud and local compute resources.
  • Integration and Collaboration:
  • Work with front-end engineers for seamless integration.
  • Collaborate on proprietary vs open-source components.
  • Partner with AWS and external labs for integrations.
  • Scalability and Professionalism:
  • Design inheritable, maintainable pipelines.
  • Handle large data transfers efficiently.
  • Follow best practices in data engineering, DevOps, and MLOps.
  • Enhance Existing Projects:
  • Build on foundational work like embeddings-worker and embeddings-api to improve speed, scale, and extensibility.
  • Future Leadership:
  • Lead the inference pipeline component.
  • Potentially grow into an engineering manager role.

Scope of Work: First Two Months

  1. Increase Speed and Scale:
  • Optimize the pipeline for billions of embeddings and faster inference.
  1. Tokenize Imagery:
  • Create a process to tokenize imagery, store in S3.
  1. Run Model Inference:
  • Implement inference with pre-trained models, store embeddings in a vector database, collaborate with AWS.
  1. Nice-to-Have:
  • Develop mosaicking for cloud cover mitigation.

Additional Scope for First Two Months

  1. Operationalize CLIP Retrieval:
  • Build scalable inference for CLIP and other models, store image chips in S3.
  1. Experiment with Multi-Modal Retrieval:
  • Enable image and text queries, combine embeddings, explore methods like WEICOM.
  1. Database & API Design:
  • Design scalable vector search with external partners, develop APIs.
  1. Pre-Processing for Image Quality:
  • Develop mosaicking features for better image quality.
  1. Performance Optimization:
  • Ensure speed, scalability, and flexibility in inference.

Requirements

Technical Skills:

  • Proficiency in Python, Docker.
  • 10+ years in production data pipelines.
  • Experience with geospatial libraries, PyTorch, cloud tools, databases.
  • Knowledge of inference pipelines, real-time strategies.

Experience:

  • Satellite image formats, protocols (STAC, GeoTIFF, Zarr).
  • AWS infrastructure (bonus).
  • Background in MLOps and geospatial AI.

Soft Skills:

  • Self-directed, adaptable, collaborative, eager to learn.

Benefits

Cultural values include humility, integrity, and effectiveness. Benefits include competitive salary, equity, flexible work, and a chance to shape LGND's tech foundation.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Lead Data Engineer - Remote

Jobot

San Francisco

Remote

USD 130,000 - 205,000

28 days ago

Lead Data Engineer - Remote

Jobot

San Francisco

Remote

USD 130,000 - 220,000

23 days ago

Lead Data Engineer - Remote

Jobot

Levittown

Remote

USD 130,000 - 220,000

3 days ago
Be an early applicant

Lead Data Engineer

RightClick

Remote

USD 97,000 - 720,000

5 days ago
Be an early applicant

Lead Data Engineer

Softcrylic

Remote

USD 80,000 - 720,000

3 days ago
Be an early applicant

Lead Data Engineer

Franklin Fitch

Remote

USD 150,000 - 210,000

3 days ago
Be an early applicant

Lead Data Engineer - Snowflake - 100% REMOTE 130-140K

Clear Point Consultants

Remote

USD 130,000 - 140,000

3 days ago
Be an early applicant

[Hiring] Lead Data Engineer @Globaldev Group

Globaldev Group

Remote

USD 110,000 - 150,000

6 days ago
Be an early applicant

PRINCIPAL DATA ENGINEER (REMOTE)

Claritev

Irving

Remote

USD 130,000 - 155,000

3 days ago
Be an early applicant