Enable job alerts via email!

Founding Data Engineer

Stealth AI Startup

England

On-site

GBP 100,000 - 140,000

Full time

18 days ago

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A well-funded generative AI company is looking for a Founding Data Engineer to build and scale data pipelines in a hybrid work environment. Ideal candidates are skilled in Python, cloud infrastructure, and have experience in distributed systems such as Kubernetes, Spark, and Databricks. This role offers a competitive salary between £100,000 and £140,000 plus equity. Experience in early-stage start-ups is a plus.

Qualifications

Experience building data infrastructure and pipelines from ground up.
Hands-on with cloud-based infrastructure for faster deployments.
Prior experience working at an early-stage start-up is preferred.

Responsibilities

Build and optimize data pipelines for large-scale, multimodal datasets.
Design and operate distributed data processing across Spark, Databricks, and Kubernetes.
Productionize ML models from prototype to deployment.

Skills

Strong Python programming

Experience with distributed compute frameworks

Cloud infrastructure management (Kubernetes, CI/CD)

Dataset versioning and orchestration (DVC, MLflow, Airflow)

Tools

Kubernetes

Docker

PyTorch

Spark

Databricks

Founding Data Engineer

London (hybrid)

£100-140k base + equity

Do you have experience building data infrastructure and pipelines from ground up?

Are you skilled in making cloud-based infrastructure (Kubernetes, Docker, CI / CD) faster and more reliable?

Do you have experience with ML deployments and pipelines?

We're a well-funded generative AI company training diffusion models to create a state-of-the-art platform for synthetic data generation.

We are seeking to recruit a Data Infrastructure Engineer to help design and scale the pipelines that power our core technology. You’ll enable our researchers and engineers to train, validate, and deploy AI models faster across cloud and distributed environments.

Build and optimize data pipelines for large-scale, multimodal datasets.
Design and operate distributed data processing across Spark, Databricks, and Kubernetes.
Improve developer productivity through faster builds, better orchestration, and scalable infra.
Productionize ML models (PyTorch), from prototype to deployment.

We are seeking someone with the following experience :

Strong Python programming and solid data engineering fundamentals.
Hands‑on with Ray, Spark, Databricks, or similar distributed compute frameworks.
Experience managing cloud infrastructure, Kubernetes, and CI / CD pipelines.
Familiar with dataset versioning, orchestration, and experiment tracking (DVC, MLflow, Airflow).

Ideally, we are looking for someone with prior experience working at an early‑stage start‑up.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Top cities

Top companies

Popular jobs