Enable job alerts via email!

SRE/Backend Engineer

Madfish

United Kingdom

Remote

GBP 50,000 - 70,000

Full time

2 days ago

Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A project management platform provider in the United Kingdom is looking for a driven Site Reliability Engineer to enhance the stability and reliability of their cloud-based infrastructure. Candidates should have over 3 years of experience in backend or site reliability engineering, with skills in Redis, PostgreSQL, and AWS services. This role offers the opportunity to work on a platform used by thousands daily, ensuring optimal performance and reliability.

Qualifications

3+ years of experience in backend or site reliability engineering.
Solid understanding of distributed systems, backend performance, and fault tolerance.
Proven experience debugging production issues.

Responsibilities

Investigate and resolve complex system failures.
Develop automations, runbooks, and automated responses.
Collaborate with cross-functional teams.

Skills

Backend or site reliability engineering experience

Redis

PostgreSQL

AWS services (DynamoDB, AuroraDB)

Terraform

Nginx

Datadog

Problem-solving skills

Intermediate English

Tools

Docker

ECS

Kubernetes

Node.js

TypeScript

Requirements

3+ years of experience in backend or site reliability engineering
Experience with Redis, PostgreSQL, and AWS services (DynamoDB, AuroraDB)
Experience with Terraform, Nginx, and Datadog for infrastructure management and monitoring
Solid understanding of distributed systems, backend performance, and fault tolerance
Proven experience debugging production issues and improving system reliability
Strong problem-solving, analytical, and troubleshooting skills
Intermediate or higher level of English (written and spoken)

Would be a plus

Experience writing runbooks, automated mitigations, or self-healing systems
Knowledge of observability best practices (metrics, logs, tracing)
Experience with containerized environments (Docker, ECS, or Kubernetes)
Prior work in on-call rotations and incident management
Strong proficiency in Node.js and TypeScript

Responsibilities

Investigate and resolve complex system failures (e.g., SLO breaches or service degradation across regions)
Root cause and fix production defects, ensuring long-term reliability improvements
Develop automations, runbooks, and automated responses to reduce operational load and improve incident response
Build optimizations and reliability enhancements for backend workloads
Ensure proper testing, observability, and documentation for all changes
Collaborate with cross-functional teams to share insights and prevent future incidents
Continuously improve service performance, scalability, and monitoring coverage

About the project

Project management platform. With the help of own API, it’s possible to do multiple integrations with communication, sales, analytics and other applications that are used by co-workers on a daily basis.

We are seeking driven and innovative software engineers with a strong background in, or passion for, Site Reliability Engineering (SRE) to help build the ultimate all-in-one productivity platform. As an SRE, you will play a key role in enhancing the stability, availability, and reliability of our globally distributed, cloud-based infrastructure that supports thousands of users daily.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Top cities

Top companies

Popular jobs