Role: Site Reliability Engineer ()
Location: Hybrid / Remote (UK-based)
Tech Stack: AWS, MongoDB, Docker, CI/CD, Prometheus, Python
Why This Role
Looking to work at the intersection of DevOps, backend engineering, and real-time problem-solving? Here’s your chance to make a real impact in a high-scale cloud environment, keeping production systems fast, reliable, and resilient for thousands of users.
You’ll join a collaborative, tech-savvy team dedicated to making things just work better. From improving observability across microservices to responding to high-priority incidents, this is your platform to shape how scalable applications are delivered and supported.
What You’ll Be Doing
- Fix and improve: Hunt down bugs in live microservices and make production more stable every day.
- Pair up with engineers: Collaborate with dev teams to sharpen code quality, boost resilience, and embed observability from the start.
- Own the cloud: Configure and manage cloud infrastructure (AWS), keeping everything humming at scale.
- Watch the signals: Build better monitoring and alerting systems to catch issues before they escalate.
- Troubleshoot deeply: Solve complex technical puzzles and help guide others through them.
- Automate everything: Write and maintain SOPs and automation scripts to reduce manual toil.
- Be the calm in the storm: Participate in the on-call rota and take ownership of live issues when they arise.
What We’re Looking For
- Solid experience debugging live applications and resolving production issues quickly.
- Background in building and supporting microservice-based applications.
- Confidence working with MongoDB, AWS services, and containerisation tools like Docker or ECS.
- Familiarity with infrastructure-as-code and CI/CD pipelines (CloudFormation, CodeBuild, etc.).
- Comfort using monitoring/observability tools like Prometheus, NewRelic, Grafana, or DataDog.
- Good grasp of scripting (Python or JS) for automation and tooling.
- Clear thinking in the face of incidents plus the drive to learn from them.
Bonus Points For
- Knowledge of REST, GraphQL, and async messaging systems.
- Experience with Git workflows and CI/CD pipelines.
- Understanding of SRE principles (SLIs, SLOs, error budgets, etc.).
- Awareness of security and compliance (GDPR, privacy, risk management).
- Clear communicator with a team-first attitude.
Why You’ll Love It Here
- You’ll work with brilliant engineers who care about quality, automation, and clean code.
- You’ll have the freedom to shape infrastructure as we scale and evolve.
- You’ll gain deep exposure to modern DevOps tooling, incident response strategy, and production engineering.
- Your voice will matter—from tech choices to process improvements.
Apply directly or contact us for more details.