Enable job alerts via email!

Senior Site Reliability Engineer

loveholidays

London

On-site

GBP 40,000 - 80,000

Full time

30+ days ago

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Start fresh or import an existing resume

Job summary

Join a rapidly growing online travel agency as a Site Reliability Engineer, where technology drives success. You'll be at the forefront of evolving SRE practices, improving system reliability, and ensuring high performance in a dynamic environment. With a focus on observability and performance testing, you will help engineering teams succeed in operations while working with cutting-edge cloud technologies. Enjoy a vibrant workplace culture with opportunities for professional growth and exciting perks like discounted holidays and a generous holiday allowance. This role is perfect for those passionate about technology and making a significant impact in the travel industry.

Benefits

Company pension contributions at 5%

Training budget

Discounted holidays

25 days of holidays per annum

Ability to buy and sell annual leave

Cycle to work scheme

Season ticket loan

Eye care vouchers

Qualifications

Experience in Site Reliability Engineering and incident management practices.
Strong understanding of performance testing and observability tools.

Responsibilities

Contribute to SRE practices like incident management and error budgets.
Improve reliability KPIs and balance feature delivery with reliability.

Skills

Site Reliability Engineering

Incident Management

Performance Testing

Low-level Debugging

Observability

Education

Bachelor's in Computer Science

Tools

Prometheus

Grafana

Loki

Tempo

Java Flight Recorder

Go’s pprof

Linkerd

We are a rapidly growing online travel agency with technology at the heart of our success. In 2022, we sent millions of people on their dream holiday.

With a million visitors a day, our 100+ services handle 8k requests per second, while maintaining p95 search latency of 150ms. Our observability captures and processes 1TB of logs a day and 350k metric samples a second.

We focus on differentiation by relying heavily on open source, while also giving back through contributions to public repositories, open sourcing in-house tools and sponsoring conferences.

Responsibilities

As our first Site Reliability Engineer, you will contribute to the evolution of SRE practices like incident management, blameless postmortems, SLOs and error budgets. You will contribute to building reliable, performant, auto-scalable and highly available systems. You will have support of the existing Platform Infrastructure team.

Leveling up of SRE practices across the teams.
Improvement of reliability KPIs of the platform.
Help balance reliability with feature delivery using SLOs and error budgets.

Our engineering teams own the lifecycle of services from first commit to high-load operation in production. Your responsibility will be to help engineering teams succeed at operations, not to run their services for them.

What you'll be working on

Exposing slow running code paths in critical applications using tools like Java Flight Recorder or Go’s pprof.
Writing tools or modifying existing applications with reliability and performance in mind.
Ensuring our systems and their individual components can withstand x10 load by improving our performance testing.
Shortening mean time to discovery and recovery with improvements to observability and alerting.

We place a strong focus on observability, continually evolving our monitoring and alerting stack, currently centred around the Mimir (Prometheus), Grafana, Loki, Tempo ecosystem. Our service mesh (Linkerd) provides uniform observability of all production services at 10s intervals.

Performance and scalability are integral to our software and infrastructure development process, achieved by combining Computer Science fundamentals and cutting edge cloud technologies.

Low-level debugging and troubleshooting.

What we'll give back to you

Company pension contributions at 5%.
Training budget for you to learn on the job and level yourself up.
Discounted holidays for you, your family and friends.
25 days of holidays per annum (plus 8 public holidays) increases by 1 day for every second year of service, up to a maximum 30 days per annum.
Ability to buy and sell annual leave.
Cycle to work scheme, season ticket loan and eye care vouchers.

About the company

loveholidays offer a bespoke way of searching for your next getaway, giving you the chance to personalise your holiday with the ultimate flexibility. Plus, book confidently knowing your holiday is ATOL protected.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Senior Site Reliability Engineer

loveholidays

London

On-site

GBP 40,000 - 80,000