Aktiviere Job-Benachrichtigungen per E-Mail!

SRE / Platform Engineer

Peec AI

Berlin

Vor Ort

EUR 70.000 - 90.000

Vollzeit

Heute
Sei unter den ersten Bewerbenden

Zusammenfassung

A fast-growing tech startup in Berlin is seeking a Site Reliability Engineer to enhance the reliability and scalability of their systems. The ideal candidate has over 5 years of experience in similar roles, deep expertise in Infrastructure as Code, and strong programming skills. This role offers significant ownership within a dynamic environment, competitive equity compensation, and a vibrant office space.

Leistungen

Equity compensation package
Paid Uber Eats when working late
Team events and off-sites
Beautiful office environment in Berlin

Qualifikationen

  • 5+ years of experience in Site Reliability Engineering or similar.
  • Proficient with major cloud platforms (GCP, AWS, Azure).
  • Strong programming skills for automation and tooling.

Aufgaben

  • Own the reliability, scalability, and performance of systems.
  • Design and maintain tooling for service performance.
  • Develop incident response practices.

Kenntnisse

Site Reliability Engineering
Infrastructure Engineering
Terraform
Datadog
AWS
Python

Tools

Kubernetes
CI/CD
CloudFormation
Jobbeschreibung
What you’ll do
  • Own the reliability, scalability, and performance of Peec AI’s core systems and infrastructure

  • Design, build, and maintain the tooling, automation, and monitoring that keep our services fast, secure, and highly available

  • Partner closely with product and engineering teams to ensure new features are reliable, observable, and easy to operate from day one

  • Develop and refine incident response practices, ensuring issues are triaged quickly and resolved with minimal user impact

  • Proactively identify and address bottlenecks, single points of failure, and operational inefficiencies across the stack

  • Champion operational excellence and a culture of reliability, driving best practices across the engineering organization

What we’re looking for
  • 5+ years of experience in Site Reliability Engineering, Infrastructure Engineering, or similar roles supporting production systems at scale

  • Deep expertise with Infrastructure as Code tools (Terraform, Pulumi, CloudFormation, etc.)

  • Strong experience with observability platforms (e.g., Datadog, Sentry, Prometheus, Grafana) and incident response tooling (PagerDuty, Incident.io, or similar)

  • Proven proficiency with major cloud platforms (GCP, AWS, or Azure) and modern distributed systems

  • Strong programming and scripting skills (e.g., TypeScript and Python) for automation and tooling

  • A track record of diagnosing complex system problems and implementing robust, long‑term solutions

  • Solid understanding of CI/CD, Kubernetes, containerization, networking, databases, and cloud security principles

  • Excellent problem‑solving skills, attention to detail, and a strong commitment to operational excellence

Bonus Points
  • Experience supporting AI/ML workloads or data‑intensive systems

  • Prior SRE experience in a high‑growth startup or globally distributed infrastructure environment

  • Familiarity with zero‑downtime migrations, multi‑region architectures, or compliance frameworks

What we offer
  • Exciting and challenging work with real impact and ownership at one of Europe’s fastest‑growing Series A startups

  • Regular team events and off‑sites

  • Aggressive equity compensation package

  • Paid Uber Eats & Uber home when working late

  • The most beautiful office space and work environment in Berlin

Hol dir deinen kostenlosen, vertraulichen Lebenslauf-Check.
eine PDF-, DOC-, DOCX-, ODT- oder PAGES-Datei bis zu 5 MB per Drag & Drop ablegen.