In the dynamic landscape of On, the tech thrives much like a spirited runner: always moving, always improving. We are building technology that continues to supercharge the growth of On, helping to ignite the human spirit through movement. We’re seeking a Site Reliability Engineer to ensure our digital platforms deliver exceptional performance, reliability, and scalability to support our global customer base.
As a Site Reliability Engineer (SRE) at On, you will play an important role in building and maintaining our cloud infrastructure to support our e-commerce platforms, customer-facing applications, and internal systems. You will work closely with engineering teams to improve reliability, optimize performance, and implement automation solutions.
Your Mission
- System Reliability & Performance: Contribute to high availability (99.99%+ uptime), scalability, and performance of On’s digital platforms through proactive optimization and robust infrastructure design.
- Infrastructure Development: Build and maintain cloud-based infrastructure using Infrastructure-as-Code (IaC) tools.
- Automation: Develop and implement automation solutions to streamline deployments, reduce toil, and enhance monitoring.
- Incident Response: Lead incident resolution, perform troubleshooting, and root cause analyses towards minimizing downtime and improving system resilience.
- Monitoring & Observability: Improve and maintain monitoring, logging, and alerting solutions to ensure proactive issue detection and resolution.
- Collaboration: Partner with the SRE team and software engineers to identify opportunities, develop, and roll out major features.
- Compliance & Security: Integrate security best practices into our systems and solutions.
Your Story
- Experience in site reliability engineering with a track record of managing complex, high-traffic systems.
- Expertise in cloud platforms (GCP, AWS) and container orchestration (Kubernetes, GKE).
- Proficiency in scripting and programming (e.g., Python, Go) for automation and tooling.
- Experience with CI/CD pipelines (ArgoCD, GitHub Actions) and IaC (Terraform).
- Solid understanding of networking, load balancing, and DNS management.
- Experience with observability and monitoring for cloud native environments.
- Strong analytical skills with a proactive approach to resolving complex technical challenges.
- Excellent communication skills, with the ability to explain technical concepts to diverse stakeholders.
Nice to Have:
- Background with e-commerce platforms or high-traffic consumer applications.
- Experience in platform engineering, dedicated to building solutions that enhance developer experience (DevEx) and boost software development efficiency.
- Bachelor’s degree in Computer Science, Engineering, or a related field (or equivalent experience).
Meet The Team
You will join a skilled and dynamic team of cloud & site reliability engineers dedicated to transforming On’s technological foundation. We are crafting scalable, resilient cloud solutions to power internal operations, enhance product performance, and support On’s growth. As a key member of our team, you will shape our cloud infrastructure strategy, ensuring robust, efficient, and sustainable systems that drive innovation. Join us in Berlin, to make a lasting impact on On’s digital future!
What We Offer
On is a place that is centered around growth and progress. We offer an environment designed to give people the tools to develop holistically - to stay active, to learn, explore and innovate. Our distinctive approach combines a supportive, team-oriented atmosphere, with access to personal self-care for both physical and mental well-being, so each person is led by purpose. On is an Equal Opportunity Employer. We are committed to creating a work environment that is fair and inclusive, where all decisions related to recruitment, advancement, and retention are free of discrimination.