Job Search and Career Advice Platform

Enable job alerts via email!

SRE/Backend Engineer

Madfish

United Kingdom

Remote

GBP 50,000 - 70,000

Full time

2 days ago
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A project management platform provider in the United Kingdom is looking for a driven Site Reliability Engineer to enhance the stability and reliability of their cloud-based infrastructure. Candidates should have over 3 years of experience in backend or site reliability engineering, with skills in Redis, PostgreSQL, and AWS services. This role offers the opportunity to work on a platform used by thousands daily, ensuring optimal performance and reliability.

Qualifications

  • 3+ years of experience in backend or site reliability engineering.
  • Solid understanding of distributed systems, backend performance, and fault tolerance.
  • Proven experience debugging production issues.

Responsibilities

  • Investigate and resolve complex system failures.
  • Develop automations, runbooks, and automated responses.
  • Collaborate with cross-functional teams.

Skills

Backend or site reliability engineering experience
Redis
PostgreSQL
AWS services (DynamoDB, AuroraDB)
Terraform
Nginx
Datadog
Problem-solving skills
Intermediate English

Tools

Docker
ECS
Kubernetes
Node.js
TypeScript
Job description
Requirements
  • 3+ years of experience in backend or site reliability engineering
  • Experience with Redis, PostgreSQL, and AWS services (DynamoDB, AuroraDB)
  • Experience with Terraform, Nginx, and Datadog for infrastructure management and monitoring
  • Solid understanding of distributed systems, backend performance, and fault tolerance
  • Proven experience debugging production issues and improving system reliability
  • Strong problem-solving, analytical, and troubleshooting skills
  • Intermediate or higher level of English (written and spoken)
Would be a plus
  • Experience writing runbooks, automated mitigations, or self-healing systems
  • Knowledge of observability best practices (metrics, logs, tracing)
  • Experience with containerized environments (Docker, ECS, or Kubernetes)
  • Prior work in on-call rotations and incident management
  • Strong proficiency in Node.js and TypeScript
Responsibilities
  • Investigate and resolve complex system failures (e.g., SLO breaches or service degradation across regions)
  • Root cause and fix production defects, ensuring long-term reliability improvements
  • Develop automations, runbooks, and automated responses to reduce operational load and improve incident response
  • Build optimizations and reliability enhancements for backend workloads
  • Ensure proper testing, observability, and documentation for all changes
  • Collaborate with cross-functional teams to share insights and prevent future incidents
  • Continuously improve service performance, scalability, and monitoring coverage
About the project

Project management platform. With the help of own API, it’s possible to do multiple integrations with communication, sales, analytics and other applications that are used by co-workers on a daily basis.

We are seeking driven and innovative software engineers with a strong background in, or passion for, Site Reliability Engineering (SRE) to help build the ultimate all-in-one productivity platform. As an SRE, you will play a key role in enhancing the stability, availability, and reliability of our globally distributed, cloud-based infrastructure that supports thousands of users daily.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.