Enable job alerts via email!

Senior Resilience Tester

RLDatix

Remote

GBP 60,000 - 80,000

Full time

Yesterday

Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A healthcare technology provider is seeking a Senior Quality Engineer focused on Platform Resilience and Scalability. This UK-based remote role involves designing resilience and performance testing strategies to ensure high availability of the internal developer platform. A strong background in Kubernetes production environments and chaos engineering is essential. You will be responsible for validating scalability and performance optimisations to support global healthcare solutions while collaborating in a fast-paced cloud-native environment.

Qualifications

Strong experience in Kubernetes production environments, preferably EKS.
Success in chaos engineering and resilience testing using major frameworks.
Knowledge of distributed systems failure modes.

Responsibilities

Design chaos experiments to validate failure scenarios across clusters.
Test auto-recovery mechanisms to ensure platform resilience.
Analyse performance bottlenecks and optimise system behaviour.

Skills

Experience in Kubernetes production environments

Chaos engineering

Performance tuning

Collaborative working

Tools

Chaos Mesh

AWS Fault Injection Simulator

HoneyComb

CloudWatch

Prometheus

Senior Quality Engineer – Platform Resilience & Scalability | Platform Engineering | UK - Remote

RLDatix (RLD) is on a mission to help raise the standard of care…everywhere. Trusted by over 10,000 healthcare organisations around the world, our solutions help improve health and care. Our applications ensure that patients receive the best and safest care while supporting the providers who deliver it.

Joining TeamRLD means being part of a global effort of over 2,000 team members in making a difference in healthcare…every day.

We’re searching for a UK-based Quality Engineer – Platform Resilience & Scalability to join our Platform Engineering team, so that we can ensure our Internal Developer Platform remains resilient, scalable, and highly available across multiple global regions. The Quality Engineer will design and execute resilience and performance testing strategies to guarantee our platform meets a 99.95% uptime SLA and scales dynamically under demanding conditions.

How You’ll Spend Your Time

Design chaos experiments using tools like Chaos Mesh, Litmus, or AWS Fault Injection Simulator to validate failure scenarios across EKS clusters and regions.
Test auto-recovery mechanisms such as Karpenter autoscaling, pod restarts, and ALB failover in order to ensure platform resilience.
Analyse performance bottlenecks in Kubernetes clusters, Istio service mesh latency, and GitOps pipeline throughput to optimise system behaviour.
Validate scalability by testing rapid scale-up scenarios and multi-region failover capabilities to support 3,000+ pods per cluster.
Define and monitor SLOs/SLIs for platform services using HoneyComb, CloudWatch, and Prometheus to maintain observability and reliability.

What Kind of Things We’re Most Interested in You Having

Strong experience in Kubernetes production environments (EKS preferred).
Proven success in chaos engineering and resilience testing using major frameworks.
In-depth knowledge of distributed systems failure modes and performance tuning.
Sincere interest in building resilient, scalable platforms that power global healthcare solutions.
A knack for working collaboratively within a fast‑paced, cloud‑native environment.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Top cities

Top companies

Popular jobs