
Aktiviere Job-Benachrichtigungen per E-Mail!
Erstelle in nur wenigen Minuten einen maßgeschneiderten Lebenslauf
Überzeuge Recruiter und verdiene mehr Geld. Mehr erfahren
A stealth-mode AI start-up in Berlin is seeking a Senior Site Reliability Engineer to own the reliability and performance of their GPU-powered infrastructure. This role involves designing large-scale GPU clusters, developing automation pipelines, and collaborating with teams to optimize resource scheduling. The ideal candidate has over 7 years of experience in SRE or DevOps, with strong skills in Kubernetes and Linux systems. This position offers an annual salary of €200,000 and equity as part of the benefits package.
Join a stealth-mode hyperscale data center start-up building an AI and cloud platform, powered by thousands of H100s, H200s, and B200s, ready to go for experimentation, full-scale model training, or inference. As a Senior Site Reliability Engineer, you’ll own the reliability, performance, and automation of this GPU-powered infrastructure, ensuring seamless orchestration across environments managed by Slurm, Kubernetes, or direct SSH access.
This is a rare opportunity to work at the intersection of hyperscale infrastructure and AI, shaping the operational backbone of one of the largest GPU clusters in private deployment.
If you want to build and operate infrastructure for frontier AI workloads, automate systems at petascale, and be part of a founding engineering team, this is the place to do it.
If you are interested in this incredible opportunity, get in touch today! You don't want to miss out!