Role: Site Reliability Engineer (Node.js)
Location: Hybrid / Remote (UK-based)
Tech Stack: Node.js | AWS | MongoDB | Docker | CI/CD | Prometheus | Python
Why This Role?
Looking to work at the intersection of DevOps, backend engineering, and real-time problem-solving? Here’s your chance to make a real impact in a high-scale cloud environment, keeping production systems fast, reliable, and resilient for thousands of users.
You’ll join a collaborative, tech-savvy team dedicated to making things just work better. From improving observability across microservices to responding to high-priority incidents, this is your platform to shape how scalable applications are delivered and supported.
️ What You’ll Be Doing
- Fix and improve: Hunt down bugs in live Node.js microservices and make production more stable every day.
- Pair up with engineers: Collaborate with dev teams to sharpen code quality, boost resilience, and embed observability from the start.
- ️ Own the cloud: Configure and manage cloud infrastructure (AWS), keeping everything humming at scale.
- Watch the signals: Build better monitoring and alerting systems to catch issues before they escalate.
- Troubleshoot deeply: Solve complex technical puzzles and help guide others through them.
- Automate everything: Write and maintain SOPs and automation scripts to reduce manual toil.
- Be the calm in the storm: Participate in the on-call rota and take ownership of live issues when they arise.
What We’re Looking For
- Solid experience debugging live Node.js applications and resolving production issues fast.
- Background in building and supporting microservice-based applications.
- Confidence working with MongoDB, AWS services, and containerisation tools like Docker or ECS.
- Familiarity with infrastructure-as-code and CI/CD pipelines (CloudFormation, CodeBuild, etc.).
- Comfort using monitoring/observability tools like Prometheus, NewRelic, Grafana, or DataDog.
- Good grasp of scripting (Python or JS) for automation and tooling.
- Clear thinking in the face of incidents—plus the drive to learn from them.
Bonus Points For
- Knowledge of REST, GraphQL, and async messaging systems.
- Experience with Git workflows and CI/CD pipelines.
- Understanding of SRE principles (SLIs, SLOs, error budgets, etc.).
- Awareness of security and compliance (GDPR, privacy, risk management).
- Clear communicator with a team-first attitude.
Why You'll Love It Here
- You’ll work with brilliant engineers who care about quality, automation, and clean code.
- You’ll have the freedom to shape infrastructure as we scale and evolve.
- You’ll gain deep exposure to modern DevOps tooling, incident response strategy, and production engineering.
- Your voice will matter—from tech choices to process improvements.
Apply direct or contact annie.palmer@wearenumi.com