Enable job alerts via email!
Boost your interview chances
Create a job specific, tailored resume for higher success rate.
Join a rapidly growing online travel agency as a Site Reliability Engineer, where technology drives success. You'll be at the forefront of evolving SRE practices, improving system reliability, and ensuring high performance in a dynamic environment. With a focus on observability and performance testing, you will help engineering teams succeed in operations while working with cutting-edge cloud technologies. Enjoy a vibrant workplace culture with opportunities for professional growth and exciting perks like discounted holidays and a generous holiday allowance. This role is perfect for those passionate about technology and making a significant impact in the travel industry.
We are a rapidly growing online travel agency with technology at the heart of our success. In 2022, we sent millions of people on their dream holiday.
With a million visitors a day, our 100+ services handle 8k requests per second, while maintaining p95 search latency of 150ms. Our observability captures and processes 1TB of logs a day and 350k metric samples a second.
We focus on differentiation by relying heavily on open source, while also giving back through contributions to public repositories, open sourcing in-house tools and sponsoring conferences.
As our first Site Reliability Engineer, you will contribute to the evolution of SRE practices like incident management, blameless postmortems, SLOs and error budgets. You will contribute to building reliable, performant, auto-scalable and highly available systems. You will have support of the existing Platform Infrastructure team.
Our engineering teams own the lifecycle of services from first commit to high-load operation in production. Your responsibility will be to help engineering teams succeed at operations, not to run their services for them.
We place a strong focus on observability, continually evolving our monitoring and alerting stack, currently centred around the Mimir (Prometheus), Grafana, Loki, Tempo ecosystem. Our service mesh (Linkerd) provides uniform observability of all production services at 10s intervals.
Performance and scalability are integral to our software and infrastructure development process, achieved by combining Computer Science fundamentals and cutting edge cloud technologies.
loveholidays offer a bespoke way of searching for your next getaway, giving you the chance to personalise your holiday with the ultimate flexibility. Plus, book confidently knowing your holiday is ATOL protected.