
Aktiviere Job-Benachrichtigungen per E-Mail!
Erstelle in nur wenigen Minuten einen maßgeschneiderten Lebenslauf
Überzeuge Recruiter und verdiene mehr Geld. Mehr erfahren
A growing AI startup in Berlin is seeking a software engineer to build evaluation infrastructure using a modern tech stack, including Django and React. The role involves developing scalable applications for AI evaluation, collaborating closely with teams, and working on innovative integrations. Ideal candidates will have solid software engineering fundamentals, fluency in English, and comfort working in on-site environments. Competitive compensation and opportunities for growth are offered, including stock options.
At ellamind, we build evaluation-first AI infrastructure. Our platform elluminate turns AI evaluation from ad-hoc “vibe checks” into rigorous, repeatable engineering to enable teams to test, measure, and improve LLM applications with confidence.
Build evaluation infrastructure: Develop scalable Django Ninja APIs and Next.js/React interfaces that power structured experiments and automated evaluations.
Design experiment management: Ship intuitive flows for user inputs, evaluation data management, rating criteria, and multi-provider runs.
Deliver analytics that matter: Create dashboards for result summaries, rating distributions, regression tracking, and compliance reports to drive data-informed decisions.
Advance our Python SDK: Extend the client library to create experiments, generate responses, and retrieve results—fitting naturally into modern AI dev workflows.
Optimize for scale: Own async task processing, query optimization, API performance, and containerized deployments to run thousands of evaluations reliably.
Shape the UX: Design for batch testing, scheduling, and collaborative reviews so rigorous evaluation is accessible to whole teams.
Raise the bar on platform quality: Improve CI/CD, containerization, code health, and reviews—establishing best practices across the codebase.
Integrate AI pragmatically: Explore sensible, cutting-edge integrations that improve how teams build, test, and ship LLM apps.
Our platform is built with Python (Django) on the backend, React/Next.js on the frontend, and PostgreSQL as our database. You\'ll also work with async workers, Docker + Kubernetes, and integrate with multiple LLM providers.
What matters most: We care about engineering excellence, not language expertise. If you\'re exceptional with other modern frameworks and excited to work in our stack, we want to hear from you. Strong engineers adapt quickly and raise the bar wherever they work.
Must-haves
Experience with modern frontend (React / Next.js or similar) and backend (Django or FastAPI), shipping production features end-to-end.
Solid software engineering fundamentals: API design, data modeling, testing, and performance.
Comfort with Docker/Kubernetes and CI/CD workflows.
Enthusiasm about AI and its possible applications in software development.
On-site collaboration >=3 days/week in Berlin or Bremen. Travel to our Bremen HQ during onboarding.
Fluency in English (at least B2).
Valid EU work authorization.
Nice-to-haves
• Hands-on work with LLM apps, evaluation frameworks, prompt/versioning workflows, or developer tooling/SDKs.
• Experience with data-heavy dashboards and analytics; familiarity with async workers (e.g., Celery) and PostgreSQL.
• German language skills.
• Exposure to privacy-sensitive or on-prem deployments.
What matters most
We prioritize demonstrated excellence in your projects and career. If you\u2019re motivated to build and optimize AI solutions, we want to hear from you—even if you don\u2019t meet every single criterion.
Diversity & inclusion
Different perspectives make us stronger. We welcome applicants from all backgrounds and encourage you to apply.
Shape the future of AI development: You\'ll have significant influence on our product and technology direction while building critical infrastructure that every serious AI team needs.
Technical excellence meets cutting-edge innovation: Work with a modern, well-architected stack (Django Ninja + Next.js + Python SDK) on complex challenges like distributed systems, multi-LLM integrations, and real-time experiment tracking. Without legacy baggage holding you back.
Career-defining opportunity: You\u2019ll be building essential AI evaluation infrastructure during a massive market transformation. As systematic testing becomes fundamental to AI development, you\u2019ll be at the center of this shift, working on technology that\u2019s becoming as critical as version control.
Ownership and impact: Get full end-to-end ownership of features, direct collaboration with AI researchers and ML engineers, and immediate feedback on how your code helps teams ship better AI products. Your engineering decisions directly shape how thousands of developers work.
Competitive package with upside: In addition to a competitive salary, we offer a VSOP (Virtual Stock Option Program) to give you a real stake in the company\u2019s success as we grow this essential AI infrastructure.
Best-in-class development experience: Fast and streamlined access to all AI technologies that make your life (and development work) easier, plus the latest tools and platforms to maximize your productivity.
Work environment: Our Bremen office features stunning waterfront views, complimentary beverages, smoothies, and a boat. We\u2019re opening our Berlin office at the end of 2025, giving you flexibility as we expand.
Grow with transformative technology: Build deep expertise in AI evaluation and LLM infrastructure alongside our expanding team, mastering the technologies that are reshaping software development while helping define industry standards.
We are a cash-flow-positive Germany-based AI startup building elluminate, the enterprise platform that turns AI evaluation from ad-hoc experiments into rigorous, repeatable workflows so teams can ship reliable AI with confidence. Teams use elluminate to design test suites, benchmark models, track regressions, and ship reliable AI with clear, measurable quality gates. We pair elluminate with custom large-language-model solutions and full on-prem deployment options. Our products have already earned the trust of renowned clients such as DeutscheTelekom, the German Federal Government, and leading health insurers like hkk.
Rooted in Bremen and collaborating with leading organizations, our team has a track record in advanced model and dataset development. We like owning problems end-to-end and shipping pragmatically, and contribute to the open-source community across initiatives like OpenEuroLLM, and regularly publish models and tools to accelerate the broader ecosystem.