Enable job alerts via email!

Senior Site Reliability Engineer

Cryptio

United Kingdom

Remote

GBP 70,000 - 90,000

Full time

Yesterday
Be an early applicant

Job summary

A leading crypto infrastructure firm is seeking a Senior Site Reliability Engineer who will take full ownership of the platform's reliability and observability. This role involves working across diverse technologies from AWS to Rust and TypeScript, ensuring a stable and resilient service as the company scales. With a focus on collaboration and continuous improvement, this position allows for shaping SRE culture in a fully remote setting for UK-based candidates. Competitive salary and benefits are offered.

Benefits

Competitive salary
Full benefits package
100% remote work with visits to Paris or London hubs

Qualifications

  • 5+ years of experience in Site Reliability, DevOps, or Infrastructure Engineering roles.
  • Deep understanding of distributed systems and debugging at the network, application, and database layers.
  • Hands-on experience with AWS, container orchestration (Kubernetes, ECS), and Infrastructure-as-Code tools.

Responsibilities

  • Own reliability end-to-end: design, measure, and improve service availability.
  • Enhance observability: expand and refine metrics, logs, and traces.
  • Lead incident management: define playbooks and improve workflows.

Skills

Site Reliability Engineering
DevOps
Infrastructure Engineering
AWS
Kubernetes
Rust
TypeScript
Cassandra
ClickHouse

Tools

GitLab CI
Docker
Pulumi
Grafana
Prometheus
Job description
About Cryptio

We’re Cryptio. We build infrastructure to bring financial integrity to the crypto economy. Our enterprise-grade back-office and data platform power mission-critical accounting, reporting, and operational workflows for institutions, corporates, and crypto-native organisations.

We’re trusted by leaders like Circle, Societe Generale, Uniswap, Gemini, and the Government of El Salvador. We’ve raised $26m from top investors including Point Nine, 1kx, Tim Draper, and Ledger Cathay.

The opportunity

We’re hiring a Senior Site Reliability Engineer (SRE) to take full ownership of Cryptio’s reliability, observability, and incident response. You’ll work across our stack—from AWS infrastructure to Rust microservices, TypeScript indexers, and data-heavy backends—to ensure our platform remains fast, stable, and resilient as we scale.

This is a role for a hands-on builder who can see across systems, trace complex issues, and design reliability into everything we ship. You’ll collaborate closely with engineering and product teams to define SLAs/SLOs, strengthen monitoring and alerting, improve incident management, and build the processes and tooling that make reliability a shared culture at Cryptio.

Key technologies
  • AWS (EKS, S3, GuardDuty, Route53, IAM, and more)

  • Rust, TypeScript (Nest.js, React, OpenAPI)

  • PostgreSQL, Cassandra, ClickHouse

  • Pulumi, GitLab CI, Docker, Kubernetes

  • Grafana, Prometheus, Loki, Jaeger

What you’ll do
  • Own reliability end-to-end: design, measure, and improve service availability, latency, and performance across Cryptio’s platform

  • Enhance observability: expand and refine metrics, logs, and traces to provide deep insight into our Rust and TypeScript services

  • Lead incident management: define playbooks, improve response workflows, and foster a blameless postmortem culture

  • Strengthen infrastructure: optimise AWS configurations, CI/CD pipelines, autoscaling, and networking for reliability and cost efficiency

  • Collaborate across teams: work with product and engineering leads to ensure reliability is considered at every design stage

  • Drive continuous improvement: identify systemic weaknesses, automate recovery where possible, and reduce MTTR across the stack

  • Champion SRE best practices: guide teams on capacity planning, runbooks, and resilience testing

What we’re looking for
  • 5+ years of experience in Site Reliability, DevOps, or Infrastructure Engineering roles

  • Deep understanding of distributed systems and debugging at the network, application, and database layers

  • Hands-on experience with AWS, container orchestration (Kubernetes, ECS), and Infrastructure-as-Code tools (Pulumi or similar)

  • Comfortable tracing through Rust and TypeScript code to diagnose complex performance or reliability issues

  • Experience with (or willingness to learn) Cassandra and ClickHouse in production

  • Strong collaborator with excellent communication skills

  • Systematic, analytical, and passionate about building reliable systems at scale

  • Interest in (or curiosity about) crypto, finance, or large-scale data systems

Why you’ll love this role
  • True ownership of reliability and uptime across a critical, fast-growing SaaS platform

  • Opportunity to shape SRE culture and processes from the ground up

  • Work with a world-class engineering team at the intersection of crypto, accounting, and data infrastructure

  • Freedom to experiment and improve observability, alerting, and recovery pipelines end-to-end

  • 100% remote (UK only), with opportunities to visit our Paris or London hubs

  • Competitive salary and full benefits package

Interview process
  • Talent Screen (15–30 min): Initial call to discuss your background, Cryptio, and the role

  • Technical Interview (60 min): Deep dive into reliability, AWS, and debugging scenarios

  • Team Interview (45 min): Meet an engineer and product manager to explore cross-team collaboration

  • CTO Interview (45 min): Discussion about technical strategy, ownership, and your vision for reliability at Cryptio

If this sounds like you, we would love to hear from you 🙌

At Cryptio, we move fast and take ownership of outcomes. We learn from failures, celebrate wins, and let humility, curiosity, and a passion for crypto guide how we work. If you value collaboration and want to build with purpose, you’ll feel right at home here.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.