Job Search and Career Advice Platform

Aktiviere Job-Benachrichtigungen per E-Mail!

Site Reliability Engineer (f/m/x)

ilert GmbH

Köln

Vor Ort

Vertraulich

Vollzeit

Vor 20 Tagen

Erstelle in nur wenigen Minuten einen maßgeschneiderten Lebenslauf

Überzeuge Recruiter und verdiene mehr Geld. Mehr erfahren

Zusammenfassung

A leading tech company in Cologne is seeking a Site Reliability Engineer to enhance the reliability and performance of their platform. This hybrid role involves working with AWS, Kubernetes, and optimizing distributed systems like Kafka and ClickHouse. The ideal candidate will have over three years of experience and a strong foundation in infrastructure automation. Benefits include flexible remote work, 28 days off, and a focus on a productive work environment.

Leistungen

28 days off
Subsidised public transport
Hybrid work model

Qualifikationen

  • 3+ years experience as SRE, Platform Engineer, DevOps Engineer, or Infrastructure Engineer.
  • Strong hands-on experience with AWS, Kubernetes, Linux internals, and networking.
  • Fluent English; German is optional.

Aufgaben

  • Run and evolve AWS-based infrastructure.
  • Build and maintain SLOs, SLIs, and observability dashboards.
  • Automate operations with Terraform and Kubernetes.

Kenntnisse

AWS
Kubernetes
Linux internals
Networking
Performance tuning
Terraform
CI/CD systems

Tools

Kafka
ClickHouse
Jobbeschreibung

Location: Hybrid – Cologne (Rheinauhafen) — 3 days in the office, 2 remote (Tue + Thu)
Team: Engineering · Reports to CTO

Keep the world awake — build reliability at scale

ilert helps thousands of DevOps & IT teams detect, fix, and communicate incidents faster.

Our platform is mission-critical: customers rely on us 24/7 to keep their always-on businesses running.

As a Site Reliability Engineer at ilert, you’ll own the reliability, performance, and scalability of our core platform across AWS, Kubernetes, Kafka, and more.

Tasks

Build & operate a highly available platform

  • Run and evolve our AWS-based infrastructure
  • Operate and optimize self-managed Kafka, ClickHouse clusters and our Observability stack
  • Ensure resilience, disaster recovery, and capacity planning across the stack

Improve reliability & performance

  • Build and maintain SLOs, SLIs, error budgets, and observability dashboards
  • Debug production issues across layers (networking, Kubernetes, application, DB)
  • Improve performance of our ingestion pipeline

Automation & tooling

  • Automate operations with Terraform, Helm, Kubernetes operators, and internal tooling
  • Build tooling for safer deploys, blue/green rollouts, and automated verification
  • Strengthen incident response workflows through deep collaboration with our AI SRE agent team

Security & compliance

  • Implement best practices for workload isolation, secrets management, IAM, and auditability
  • Support our ISO27001 posture by automating controls and hardening our infrastructure

Cross-functional impact

  • Partner with Backend, AI, and Product teams to design reliable services
  • Participate in on-call rotation
  • Lead post-incident reviews and drive reliability improvements long-term
Requirements
  • 3+ years experience as SRE, Platform Engineer, DevOps Engineer, or Infrastructure Engineer
  • Strong hands-on experience with AWS, Kubernetes, Linux internals, networking, performance tuning
  • Experience operating self-managed distributed systems, ideally Kafka or ClickHouse
  • Strong understanding of observability
  • Experience automating infrastructure with Terraform and CI/CD systems
  • Fluent English (our working language); German optional
Benefits
  • 🚀 Product-centric - 100 % focused on solving a mission-critical pain felt by every always-on business |
  • 🏡 Hybrid freedom - 2 days remote by default; gorgeous Rheinauhafen roof terrace when you’re in town |
  • 🕒 Focus > meetings - We time-box syncs, favour async docs and protect maker time |
  • 🌴 28 days off - …plus public holidays |
  • 🚲 Commute perks - subsidised public transport|
Hol dir deinen kostenlosen, vertraulichen Lebenslauf-Check.
eine PDF-, DOC-, DOCX-, ODT- oder PAGES-Datei bis zu 5 MB per Drag & Drop ablegen.