Aktiviere Job-Benachrichtigungen per E-Mail!

Site Reliability Engineer - Cloud Infrastructure (all genders)

gridX GmbH

München, Aachen

Hybrid

EUR 65.000 - 85.000

Vollzeit

Gestern

Sei unter den ersten Bewerbenden

Erstelle in nur wenigen Minuten einen maßgeschneiderten Lebenslauf

Überzeuge Recruiter und verdiene mehr Geld. Mehr erfahren

Zusammenfassung

A leading technology company in Germany is seeking an experienced engineer to enhance their cloud and container infrastructure. The ideal candidate will have robust experience in SRE roles and strong hands-on knowledge of AWS services and Kubernetes. You will drive observability, manage complex systems, and facilitate self-service capabilities for engineering teams. Enjoy a flexible working environment and a range of benefits to support personal well-being and development.

Leistungen

70 days remote work from the EU

30 days vacation

30 Euro sports allowance

Mental health management offers

1,500 Euro annual development budget

Employee discounts

Pension plan contributions

30 Euro city travel subsidy

Modern IT equipment

Annual Teamweek

Regular team events

Charity donation on birthday

Sabbatical option

Aufgaben

Evolve multi-tenant cloud and container infrastructure.
Engineer Infrastructure as Software through high-quality code.
Drive observability and architectural decisions.
Proactively architect for resilience and incident management.
Build self-service capabilities for engineering teams.

Kenntnisse

Experience in an SRE or Platform role

Solid hands-on experience with major public cloud providers

Pragmatic software engineering mindset

Operational experience with Kubernetes at scale

Reliability First mindset

Strong skills in modern programming languages

Expertise in observability stacks

Tools

AWS services (EKS, EC2, VPC, RDS, etc.)

Kubernetes

GitOps

TCP/IP, DNS, and HTTP protocols

How you can contribute to gridX

Please note: This position requires on-site work or remote work from within Germany.

Do stuff that matters- Become a part of gridX and contribute your own part to digitalise the energy industrywith us and thus make renewable energies accessible and affordable everywhere #getshitdone

At gridX, we are building the digital brain for the energy transition. We are looking for an engineer who wants their code to have a tangible impact on a sustainable future.

As part of the SRE Cloud Infrastructure team, you will join a culture defined by a single principle:"Reliability First". However for us, reliability isn't about fixing broken things or keeping the lights on manually. It’s about enablement. We engineer the automated, self-service solutions that empower our engineering teams to own their services from development to production.

You are a builder. An experienced, autonomous engineer who is ready to evolve our systems, engineer away complexity and champion a culture where reliability is built in by design.

Take Ownership:You actively evolve our multi-tenant cloud and container infrastructure. You take end-to-end ownership of various components, ensuring they are secure, scalable, observable, and cost-efficient.
Engineer Infrastructure as Software:You bring a developer's mindset to operations. You solve complexity by writing high-quality code and automation, ensuring our platform is managed strictly via declarative code.
Drive Observability:You mature our observability platform, ensuring we aren't just collecting data but providing the insights teams need to drive architectural decisions, improve performance, and establish meaningful SLOs.
Architect for Resilience:You proactively identify bottlenecks before they become incidents and, when things do break, you lead the resolution and drive post-mortems to ensure we learn.
Empower Others:You build self-service capabilities that allow engineering teams to own their full lifecycle. You also drive the adoption of best practices through code or architecture reviews and technical deep-dives and share your expertise through high-quality documentation and operational runbooks.

This is how you and your application stand out

You have solid experience in an SRE or Platform role, building and managing distributed systems in production environments. You are comfortable working with a high degree of autonomy, navigating ambiguity and driving technical initiatives end-to-end.
You have strong hands-on experience with a major public cloud provider. You understand the architectural foundations of cloud infrastructure (Compute, Storage, Networking, and IAM) and are fluent in managing them as code.
You apply a pragmatic software engineering mindset to operations. You write clean, maintainable code and scripts, prioritizing long-term stability and quality.
You have operational experience with Kubernetes at scale, understanding how to manage upgrades, security and resource allocation in a production cluster.
You embody a "Reliability First" mindset, understanding incident lifecycle management and the importance of psychological safety in engineering.

What sets you apart

You have hands-on expertise in the AWS services we use heavily, such as EKS, EC2, VPC, RDS, Lambda, S3, Kinesis, DynamoDB, SNS and SQS.
You go beyond usage and understand the internal components of Kubernetes (scheduling, API server, controllers, RBAC). Experience writing custom Controllers or Operators is a significantplus.
You have strong skills in at least one modern programming language (e.g., Go, Typescript, Java, Python, Rust) have a willingness to work withGo, which is our core language for tooling and automation and embrace AI-assisted workflows to accelerate development.
You have expertise in modern observability stacks (e.g., Grafana LGTM, Thanos, VictoriaMetrics). You can operate and tune the platform at scale, while guiding teams on effective instrumentation and alerting strategies.
You have deep technical expertise in Release Engineering and GitOps, as well as maintaining infrastructure that enable developers to release their software securely and reliably.
You have deep knowledge of TCP/IP, DNS, and HTTP protocols, and you understand the intricacies of container networking.

Your strengths

Why gridX

Flexible & mobile working:Work remotely for up to 70 days from anywhere in the EU and other selected countries such as Indonesia, Canada, Brazil and many more
Vacation:30 days for your relaxation
Sports: 30 Euro allowance for Urban Sports Club or E-Gym Wellpass
Health:Make use of our (mental) health management offers such as Nilo.health (e.g. 1:1 coaching sessions, daily meditation offers, Self-reflection options) for your mental-wellbeing
Personal development: Annual development budget of 1,500 euros per employee
Employee discounts:Access to gridX Corporate Benefits
Stay fit and safe the planet with our JobRadoffer
Set up a pension plan and receive a fair monthly contribution
City travel subsidy:30 Euros monthly allowance for your monthly/annual ticket
Modern workplace in the hearts of Aachen and Munich withIT equipment of your choice (Apple or Lenovo)
Annual Teamweek: Enjoy an unforgettable off-site, face extraordinary challenges together with all gridX teams and create unforgettable memories!
Experience the gridX culture atregular team events and receive100 Euros on top per employee for your department event
We will donate20 Euros to a charity of your choiceon your birthday
Sabbatical option:Take a break from the daily work routine and realize personal projects, travel or further education (depends on length of employment)

Hol dir deinen kostenlosen, vertraulichen Lebenslauf-Check.

eine PDF-, DOC-, DOCX-, ODT- oder PAGES-Datei bis zu 5 MB per Drag & Drop ablegen.

Top-Städte

Top-Unternehmen

Beliebte Jobs