Aktiviere Job-Benachrichtigungen per E-Mail!

Senior Data Scientist (f/m/d)

Legalhero

Berlin

Hybrid

EUR 85.000 - 120.000

Vollzeit

Vor 2 Tagen
Sei unter den ersten Bewerbenden

Erstelle in nur wenigen Minuten einen maßgeschneiderten Lebenslauf

Überzeuge Recruiter und verdiene mehr Geld. Mehr erfahren

Starte ganz am Anfang oder importiere einen vorhandenen Lebenslauf

Zusammenfassung

Legalhero, a trailblazer in Legal AI, seeks a data team lead to develop infrastructure for powerful AI models. The ideal candidate will possess extensive experience in data platforms, manage teams, and contribute to innovative projects while ensuring compliance with data privacy regulations.

Leistungen

Flexible work environment
Pet-friendly office
Complimentary beverages and snacks
Company pension plan with 20% employer contribution
Office-first culture with remote work day
Regular team events
Modern office with rooftop terrace

Qualifikationen

  • 7+ years experience designing and operating large-scale data platforms.
  • Proven track record managing annotation workflows with 10+ FTEs.
  • Deep understanding of data privacy, security, and GDPR.

Aufgaben

  • Design and develop backbone infrastructure for Legal AI agents.
  • Build and maintain robust ELT pipelines using PySpark and dbt.
  • Lead and mentor the data team while collaborating closely with AI engineers.

Kenntnisse

Python
SQL
PySpark
Statistics
Experimental Design

Ausbildung

Master's or PhD in Data Science, Statistics, Computer Science

Tools

AWS
Terraform
Pulumi
Databricks

Jobbeschreibung

What You'll Do

As a key player in our data team, you will design and develop the backbone infrastructure that powers our Legal AI agents. You’ll be responsible for transforming hundreds of thousands of historical legal cases into clean, high-quality training data — the foundation for building powerful and scalable AI models.

  • Expand and optimize our Databricks/Delta Lakehouse environment on AWS, crafting GDPR-compliant data models, data contracts, and clear data lineage
  • Build and maintain robust ELT pipelines using PySpark, dbt, and Airflow, including automated quality checks, dataset versioning, and comprehensive testing
  • Lead the design and management of a scalable annotation process, including tooling, guidelines, and quality assurance for a team of 20 paralegals
  • Develop transparent, data-driven dashboards (e.g., Tableau) to detect bias, data gaps, and model risks — providing actionable insights to executives and specialist teams
  • Define gold standards, adversarial test sets, and evaluation metrics for faithfulness, citation accuracy, and model alignment to ensure rigorous AI agent validation
  • Own the Reinforcement Learning from Human Feedback (RLHF) data cycle: curate human feedback datasets, train reward models, and monitor alignment metrics
  • Lead, mentor, and grow the data team while collaborating closely with AI engineers on RAG workflows and LLM evaluation

What you bring
  • Master’s or PhD degree in Data Science, Statistics, Computer Science, or a related field
  • 7+ years experience designing and operating large-scale data platforms, preferably with Databricks/Delta Lake or equivalent lakehouse technologies
  • Fluent English skills — German is a plus
  • Proven track record in managing annotation workflows with 10+ FTEs and integrating labeled data into machine learning pipelines
  • Strong expertise in Python, SQL, PySpark, and modern ETL best practices; solid foundation in statistics and experimental design
  • Experience with vector databases (e.g., Weaviate, pgvector), LLM evaluation, and human-in-the-loop ML processes
  • Hands-on familiarity with AWS (S3, Glue, IAM, Lambda) and Infrastructure as Code tools such as Terraform or Pulumi
  • Deep understanding of data privacy, security, and regulatory requirements (GDPR)
  • Bonus: experience with legal text corpora

What we promise

Flexible & Inspiring Work Environment

  • Office-first culture with one remote workday per week
  • Modern, air-conditioned office flooded with natural light
  • Prime location in Berlin Mitte between Gleisdreieck and Potsdamer Platz with excellent transport links
  • Spacious rooftop terrace with panoramic views over Berlin
Great Benefits
  • Complimentary beverages, fresh fruit, and snacks
  • Full coverage of your Deutschlandticket for hassle-free commuting
  • Pet-friendly office — bring your dog to work!
  • Corporate benefits platform offering exclusive discounts and deals
  • Regular company and team events to foster community spirit
  • Generous company pension plan with 20% employer contribution
  • After-work fun: Nintendo Switch, PS5, darts, and table football
Culture & Collaboration
  • Agile mindset and open communication — every voice matters
  • Meaningful projects combining cutting-edge tech and practical legal expertise
  • A feedback culture that drives real personal and professional growth
Hiring Process
  • Quick feedback turnaround
  • 20-minute phone interview with recruiting/HR
  • 60-minute on-site interview with your future manager and team
  • 30-minute meeting with management
  • Prompt job offer
Hol dir deinen kostenlosen, vertraulichen Lebenslauf-Check.
eine PDF-, DOC-, DOCX-, ODT- oder PAGES-Datei bis zu 5 MB per Drag & Drop ablegen.