Enable job alerts via email!

SRE at High Growth B2C Startup | AI + Education + Gamification

Gizmo

London

Hybrid

GBP 150,000 - 200,000

Full time

3 days ago

Be an early applicant

Job summary

A fast-growing AI startup in London is seeking a Site Reliability Engineer to manage performance and reliability as user traffic climbs. This role requires hands-on experience in scaling systems, defining SLOs, and automating operations on Kubernetes. Ideal candidates will thrive in a collaborative startup environment and prioritize impactful work. The position offers a competitive salary, equity, and a hybrid working model.

Benefits

Highly competitive salary

Equity included

Private health insurance

Opportunity to become an early employee

Qualifications

Experience running relational stores at 100k+ TPS or 1M+ concurrent users.
Ability to codify error budgets and partner on trade-offs.

Responsibilities

Define SLIs/SLOs for latency, availability and error rate.
Perform load-testing and capacity modelling.
Automate repetitive operations on Kubernetes and CI/CD.

Skills

Hands-on scale experience

Strong backend fundamentals

Proven track record of setting SLOs

Comfort with Kubernetes

Collaborative and feedback-driven

Driven by impact

Tools

PostgreSQL

OpenSearch

Redis

Hasura

Cloudflare Workers

Prometheus

Grafana

Gizmo is an AI startup on a mission to make learning so easy that anyone can learn anything. We're building Duolingo for anything - a platform that uses gamification and social mechanics to make learning fun.

With over 1 million monthly active users and $4M in annual recurring revenue, we’re already one of the fastest-growing startups in the UK. Backed by leading investors, we recently raised $16M in Series A funding to accelerate our vision of helping 1 billion people learn.

Role Overview
Reporting to the founders, you will own capacity, performance and reliability for Gizmo’s full-stack platform as daily traffic climbs from hundreds of thousands to millions of users. You’ll write code across the stack, but your charter is classic SRE: defend SLOs, eliminate toil, and raise the ceiling on scale before it becomes a hard limit.

Key Responsibilities

Define SLIs/SLOs for latency, availability and error rate; codify error budgets and partner with product teams on trade-offs.
Perform load-testing, capacity modelling and up-front scalability design for PostgreSQL, OpenSearch, Redis, Hasura and CF Workers; produce data-driven scaling plans.
Extend metrics, structured logging and tracing; establish alert rules that page only on user-visible impact; build actionable runbooks.
Join the on-call rotation, lead blameless post-mortems, drive remediation work to closure and track MTTR/MTBF improvements.
Automate repetitive ops on Kubernetes and CI/CD; keep “toil” <50 % of your time by pushing fixes into code.
Coach full-stack engineers on query optimisation, schema design and back-pressure techniques; document patterns and anti-patterns by creating an SRE playbook

Hands-on scale experience: you have run relational stores at 100 k+ TPS or 1 M+ concurrent users (e.g., multi-tenant PostgreSQL, sharded MySQL).
Strong backend fundamentals around concurrency, caching, indexing and distributed systems trade-offs.
Proven track record of setting SLOs, building dashboards (Prometheus/Grafana, OpenTelemetry, etc.) and tuning alerts.
Comfort with Kubernetes, IaC and cloud-native patterns; can debug from network to application layer.
Start-up bias for action: you prioritise high-leverage fixes, ship iteratively and own outcomes end-to-end.
Collaborative and feedback-driven; you welcome post-mortem culture and continuous improvement.
Driven by impact - you prioritise work that moves the needle!

Nice-to-haves: experience with Hasura internals, Cloudflare Workers edge optimisation, or running OpenSearch clusters at scale.

Highly competitive salary.
You'll own a piece of what you're building - equity included.
Hybrid working model with 4 days in the London Liverpool Street office.
The opportunity to become one of the earliest employees in one of the UK’s fastest-growing startups.
Private health insurance

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

SRE at High Growth B2C Startup | AI + Education + Gamification

Gizmo

London

Hybrid

GBP 150,000 - 200,000

Full time

Job summary

Benefits

Qualifications

Responsibilities

Skills

Tools

Job description

Company

Services

Free resources

Support

SRE at High Growth B2C Startup | AI + Education + Gamification

Gizmo

London

Hybrid

GBP 150,000 - 200,000

Full time

Job summary

Benefits

Qualifications

Responsibilities

Skills

Tools

Job description

Follow us

Company

Services

Free resources

Support