
Aktiviere Job-Benachrichtigungen per E-Mail!
A fast-growing tech startup in Berlin is seeking a Site Reliability Engineer to enhance the reliability and scalability of their systems. The ideal candidate has over 5 years of experience in similar roles, deep expertise in Infrastructure as Code, and strong programming skills. This role offers significant ownership within a dynamic environment, competitive equity compensation, and a vibrant office space.
Own the reliability, scalability, and performance of Peec AI’s core systems and infrastructure
Design, build, and maintain the tooling, automation, and monitoring that keep our services fast, secure, and highly available
Partner closely with product and engineering teams to ensure new features are reliable, observable, and easy to operate from day one
Develop and refine incident response practices, ensuring issues are triaged quickly and resolved with minimal user impact
Proactively identify and address bottlenecks, single points of failure, and operational inefficiencies across the stack
Champion operational excellence and a culture of reliability, driving best practices across the engineering organization
5+ years of experience in Site Reliability Engineering, Infrastructure Engineering, or similar roles supporting production systems at scale
Deep expertise with Infrastructure as Code tools (Terraform, Pulumi, CloudFormation, etc.)
Strong experience with observability platforms (e.g., Datadog, Sentry, Prometheus, Grafana) and incident response tooling (PagerDuty, Incident.io, or similar)
Proven proficiency with major cloud platforms (GCP, AWS, or Azure) and modern distributed systems
Strong programming and scripting skills (e.g., TypeScript and Python) for automation and tooling
A track record of diagnosing complex system problems and implementing robust, long‑term solutions
Solid understanding of CI/CD, Kubernetes, containerization, networking, databases, and cloud security principles
Excellent problem‑solving skills, attention to detail, and a strong commitment to operational excellence
Experience supporting AI/ML workloads or data‑intensive systems
Prior SRE experience in a high‑growth startup or globally distributed infrastructure environment
Familiarity with zero‑downtime migrations, multi‑region architectures, or compliance frameworks
Exciting and challenging work with real impact and ownership at one of Europe’s fastest‑growing Series A startups
Regular team events and off‑sites
Aggressive equity compensation package
Paid Uber Eats & Uber home when working late
The most beautiful office space and work environment in Berlin