Ativa os alertas de emprego por e-mail!

Senior Data Engineer

Bespoke Labs

Teletrabalho

BRL 80.000 - 120.000

Tempo integral

Hoje

Torna-te num dos primeiros candidatos

Cria um currículo personalizado em poucos minutos

Consegue uma entrevista e ganha mais. Sabe mais

Resumo da oferta

A premier AI research lab is seeking a Staff/Senior Data Engineer for a high-impact remote contract. This role involves architecting and building complex data systems for AI model training within a fast-paced environment. The ideal candidate will have extensive experience in enterprise-grade data platforms and proficiency in Python, Scala, and Spark, with a focus on producing high-quality data at scale. Ideal for candidates from Tier-1 enterprises. Competitive compensation offered.

Qualificações

6+ years of Data Engineering experience.
Demonstrated ownership of production data platforms.
Experience in Tier-1 enterprises (FAANG, Fortune 100).

Responsabilidades

Design data architecture for AI model training.
Write production-grade code for data ingestion and transformation.
Implement advanced filtering and quality-scoring algorithms.
Optimize processing workloads for high throughput.
Act as a technical authority on data processing and cloud structures.

Conhecimentos

Python

Scala

Spark

Kafka

Airflow

Ferramentas

Snowflake

BigQuery

Redshift

Staff/Senior Data Engineer: AI Training Data (2-4 Months Contract)

Location: Remote

Role Type: Contract (2-4 Months)

Time Commitment: 40 hrs/week (Full-time availability required)

Compensation: Hyper-competitive hourly rate (matching Tier-1 Staff engineering bands) Experience: 6-12+ years

About BespokeLabs

BespokeLabs is a premier, VC-backed AI Research lab with an exceptionally talent-dense team of IIT and Ivy League alumni. We don’t just build tooling around AI—we build the massive-scale data systems and reasoning architectures that directly power next-generation models. Our research shapes the frontier of AI: we’ve published breakthroughs like GEPA, driven foundational datasets like OpenThoughts, and shipped state-of-the-art models including Bespoke-MiniCheck and Bespoke-MiniChart. More on our website :)

Role Overview

We are looking for a top-tier Senior/Staff Data Engineer for a high-impact, 2-month sprint. You will leverage your deep expertise in enterprise-grade data platforms to architect and build the complex curation systems required for advanced AI model training.

This is not a traditional ETL pipeline role. We need a heavy-hitter who has already operated production data platforms at scale inside large, complex organizations (FAANG, Fortune 100). You will use the mental models, architectural intuition, and coding skills you\'ve developed over your career to generate, transform, and evaluate the data that trains the next generation of AI.

What You Will Do (The Contract)

Architect AI-Scale Systems: Design the overarching data architecture and processing topology needed to programmatically curate and shape datasets at TB/PB scale, ensuring low latency and high consistency.
Hands-On Development: Write production-grade code (Python/Scala, Spark) to build custom ingestion logic, highly efficient transformation scripts, and performant data validation checks.
Complex Data Logic: Implement advanced filtering, deduplication, and quality-scoring algorithms at scale, ensuring the resulting data objects are optimized for LLM/ML consumption.
Quality & Performance Tuning: Rigorously test, benchmark, and optimize processing workloads (CPU/memory tuning, partitioning strategies in Spark/Iceberg) to meet aggressive throughput targets.
Domain Subject Matter Expert: Act as the ultimate technical authority on distributed systems, data processing, and cloud structures to ensure the training data factory meets enterprise-grade accuracy.

What You Bring to the Table (Your Past Experience)

To be successful in this contract, you must have a track record of:

End-to-End Ownership: Designing and owning enterprise data platforms (batch + streaming).
High-Throughput Processing: Building and operating Kafka-first streaming pipelines.
Lakehouse Architecture: Utilizing Apache Iceberg, Delta Lake, or Hudi for analytics and ML at scale.
Reliability Engineering: Ensuring data reliability through SLAs, monitoring, backfills, and recovery.
Scale: Processing billions of events and managing TB–PB scale data systems.

Required Qualifications (Non-Negotiable)

Experience: 6+ years of Data Engineering experience.
Seniority: Demonstrated Senior/Staff-level ownership of production data platforms.
Pedigree: Background at Tier-1 enterprises (FAANG, large SaaS, Fortune 100).
Technical Stack: Deep fluency in Python/Scala, Spark, Kafka, Airflow, and Major Cloud Warehouses (Snowflake, BigQuery, Redshift).

Obtém a tua avaliação gratuita e confidencial do currículo.

ou arrasta um ficheiro em formato PDF, DOC, DOCX, ODT ou PAGES até 5 MB.

Principais localizações

Melhores empresas

Principais cargos