What You'll Do
As a key player in our data team, you will design and develop the backbone infrastructure that powers our Legal AI agents. You’ll be responsible for transforming hundreds of thousands of historical legal cases into clean, high-quality training data — the foundation for building powerful and scalable AI models.
- Expand and optimize our Databricks/Delta Lakehouse environment on AWS, crafting GDPR-compliant data models, data contracts, and clear data lineage
- Build and maintain robust ELT pipelines using PySpark, dbt, and Airflow, including automated quality checks, dataset versioning, and comprehensive testing
- Lead the design and management of a scalable annotation process, including tooling, guidelines, and quality assurance for a team of 20 paralegals
- Develop transparent, data-driven dashboards (e.g., Tableau) to detect bias, data gaps, and model risks — providing actionable insights to executives and specialist teams
- Define gold standards, adversarial test sets, and evaluation metrics for faithfulness, citation accuracy, and model alignment to ensure rigorous AI agent validation
- Own the Reinforcement Learning from Human Feedback (RLHF) data cycle: curate human feedback datasets, train reward models, and monitor alignment metrics
- Lead, mentor, and grow the data team while collaborating closely with AI engineers on RAG workflows and LLM evaluation
What you bring
- Master’s or PhD degree in Data Science, Statistics, Computer Science, or a related field
- 7+ years experience designing and operating large-scale data platforms, preferably with Databricks/Delta Lake or equivalent lakehouse technologies
- Fluent English skills — German is a plus
- Proven track record in managing annotation workflows with 10+ FTEs and integrating labeled data into machine learning pipelines
- Strong expertise in Python, SQL, PySpark, and modern ETL best practices; solid foundation in statistics and experimental design
- Experience with vector databases (e.g., Weaviate, pgvector), LLM evaluation, and human-in-the-loop ML processes
- Hands-on familiarity with AWS (S3, Glue, IAM, Lambda) and Infrastructure as Code tools such as Terraform or Pulumi
- Deep understanding of data privacy, security, and regulatory requirements (GDPR)
- Bonus: experience with legal text corpora
What we promise
Flexible & Inspiring Work Environment
- Office-first culture with one remote workday per week
- Modern, air-conditioned office flooded with natural light
- Prime location in Berlin Mitte between Gleisdreieck and Potsdamer Platz with excellent transport links
- Spacious rooftop terrace with panoramic views over Berlin
Great Benefits- Complimentary beverages, fresh fruit, and snacks
- Full coverage of your Deutschlandticket for hassle-free commuting
- Pet-friendly office — bring your dog to work!
- Corporate benefits platform offering exclusive discounts and deals
- Regular company and team events to foster community spirit
- Generous company pension plan with 20% employer contribution
- After-work fun: Nintendo Switch, PS5, darts, and table football
Culture & Collaboration- Agile mindset and open communication — every voice matters
- Meaningful projects combining cutting-edge tech and practical legal expertise
- A feedback culture that drives real personal and professional growth
Hiring Process- Quick feedback turnaround
- 20-minute phone interview with recruiting/HR
- 60-minute on-site interview with your future manager and team
- 30-minute meeting with management
- Prompt job offer