Enable job alerts via email!

Senior Site Reliability Engineer - Databases (Remote, UK)

Grafana Labs

United Kingdom

Remote

GBP 84,000 - 102,000

Full time

3 days ago
Be an early applicant

Job summary

A leading cloud service provider is seeking a Senior Site Reliability Engineer for remote work in the UK. This role involves enhancing the reliability of cloud databases, collaborating with cross-functional teams, and overseeing software configurations. Candidates should have extensive engineering experience and strong communication skills. Salary range is £84,841 - £101,809 with benefits including equity and bonuses.

Benefits

Equity
Bonuses
Flexible work hours

Qualifications

  • 6+ years of engineering experience, with 3+ years in SRE roles.
  • Experience as a reliability/production engineer or software engineer.
  • Knowledge of Linux internals and cloud storage.
  • Ability to work autonomously and within a team environment.

Responsibilities

  • Conduct regular 1:1s with your manager and colleagues.
  • Review and set SLOs for monitoring and automation improvements.
  • Enhance observability within customer environments.
  • Collaborate on product strategy and technical designs.

Skills

Engineering experience
SRE experience
Strong communication skills
Kubernetes experience
Proficiency in programming languages
Troubleshooting skills
Incident response experience
Autonomous work ability

Tools

Kubernetes on AWS
Helm charts
GCP
Azure

Job description

Social network you want to login/join with:

Senior Site Reliability Engineer - Databases (Remote, UK), United Kingdom (Remote)

col-narrow-left

Client:
Location:
Job Category:

Other

-

EU work permit required:

Yes

col-narrow-right

Job Reference:

a95cc6c0d0f0

Job Views:

16

Posted:

12.08.2025

Expiry Date:

26.09.2025

col-wide

Job Description:

Senior Site Reliability Engineer - Databases

This is a remote position and we're considering candidates in Spain, Sweden, the UK, and Germany.

About the role:

We are looking for a Senior SRE to support our highest value Grafana Cloud customers by increasing the reliability of our cloud databases based on Mimir, Loki, Tempo, and Pyroscope. These databases are provided as SaaS from AWS, GCP, and Azure across all regions.

The SRE team is a new team within the Databases department, owning environments for our largest customers and acting as an overlay to existing database teams. As an SRE, you will manage software configurations, participate in feature development, oversee releases, and ensure they meet SLOs without degrading user experience. You will contribute to design documents, code reviews, and other engineering activities to improve reliability, observability, and customer guidance.

This role involves an on-call element, shared with the Mimir team, focusing on customer experience while being supported by another engineer. Our company hires globally (remote-only) to optimize on-call health and align with 12 daylight hours per day.

What we seek:

  • At least 6 years of engineering experience, with 3+ years in SRE roles.
  • Experience as a reliability/production engineer, infrastructure/systems engineer, or software engineer with an infrastructure focus.
  • Strong communication skills for technical discussions and cross-organizational collaboration.
  • Experience with Kubernetes on AWS, GCP, or Azure, and with Helm charts or other IaC tools.
  • Experience with SRE practices, distributed computing, and related areas.
  • Proficiency in programming languages such as Go, Python, Java, etc.
  • Knowledge of Linux internals, networking, cloud storage, and scaling.
  • Excellent troubleshooting skills.
  • Experience in incident response, post-incident reviews, and proactive problem management.
  • Ability to work autonomously within a team environment.
  • Values include curiosity, transparency, action bias, and kindness.

Your day-to-day will include:

  • Conducting regular 1:1s with your manager and colleagues.
  • Reviewing and setting SLOs, and working on improvements like monitoring, automation, self-healing, and auto-scaling.
  • Enhancing observability within customer environments.
  • Designing and implementing solutions for reliability and scalability.
  • Creating fault-tolerant patterns considering the entire service lifecycle.
  • Collaborating on product strategy, roadmaps, and technical designs.
  • Participating in PR reviews and design discussions.
  • Sharing knowledge about SRE best practices.
  • Engaging in incident response, investigation, PIRs, and customer communication.

In the UK, the base salary range is £84,841 - £101,809. Compensation varies based on experience and skills. Benefits include equity, bonuses, and others. If applying from a different country, the recruiter will discuss specific pay and benefits.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs