Enable job alerts via email!

Site Reliability Engineer

Unitary

United Kingdom

Remote

GBP 60,000 - 80,000

Full time

3 days ago

Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

Unitary is seeking a Site Reliability Engineer to enhance system reliability and performance. This role involves building robust infrastructure while collaborating with development teams. Ideal candidates are problem-solvers comfortable with both urgent fixes and proactive improvements in a dynamic startup environment.

Benefits

Flexible hours and location

Competitive salary and equity package

Generous paid parental leave

Generous paid sick leave

Annual budget for professional development

Annual budget for health and wellness

Team offsites

Qualifications

Experience with visualization tools like Grafana.
Proficient in metrics platforms such as Prometheus.
Strong coding skills in Go or Python.

Responsibilities

Implement alerting systems to detect and address issues early.
Collaborate with development teams for observability.
Optimise incident response through detailed runbooks.

Skills

Problem-solving

Collaboration

Monitoring

Automation

Observability

Education

Bachelor's degree in Computer Science or related field

Tools

Grafana

Prometheus

Incident.io

CI/CD tools

Kubernetes

Terraform

Direct message the job poster from Unitary

The Machine Learning Recruiter! | I've been filling MLE roles for over 7 years | Winner of Outstanding Advocate for Women in Tech 23 |…

SRE (Unitary AI)

Description

The company

We are a rapidly growing startup developing solutions that blend human expertise and AI agents to handle manual customer and marketplace operations tasks. Our unique approach combines the strengths of human expertise (high accuracy and nuanced decision-making) with the advantages of AI automation (speed and cost efficiency). This cutting-edge technology helps businesses solve real-world challenges in trust & safety and beyond without complex technical integration. We believe in an online world free from harm, where we can trust AI to make safe and fair decisions.

We have raised about $25M in VC funding from top tier funds including Creandum and Plural, and operate at significant scale - analysing millions of daily images and videos. But we are just at the beginning of our journey - and we are very excited about our plans for growth over the coming year and beyond!

The role

We are now looking for a Site Reliability Engineer to ensure our systems run smoothly and reliably at scale. Your expertise in monitoring, observability, and system automation will help maintain the high availability and performance our customers depend on. You will work at the intersection of development and operations, using your technical skills to build robust infrastructure and streamline deployment processes.

Your mission will be to proactively identify and resolve system issues before they impact our customers. You will collaborate closely with development teams to implement monitoring solutions, create comprehensive alerting systems, and develop the tools needed to maintain system reliability. Initially, you will focus on enhancing our existing monitoring and alerting infrastructure, then gradually build self-healing systems and self-service capabilities that empower teams to diagnose and resolve issues independently.

As part of this role, you will:

Design and implement comprehensive alerting systems that detect issues early and provide actionable insights to streamline the resolution of these issues.
Collaborate with our development teams to ensure our observability stack provides clear visibility into system health and performance.
Optimise on-call processes, including creating and maintaining detailed runbooks that enable efficient incident response and knowledge sharing across teams.
Build self-healing systems using AI tools that automatically resolve common issues before they require human intervention.
Develop automation tools and diagnostic capabilities that help teams quickly identify and resolve issues when manual investigation is required.
Ensure secure and reliable code deployment processes through robust CI/CD pipelines and infrastructure automation.
Join our 24/7 support rotation, which provides first-level platform support to ensure a great customer experience.

Requirements

You

We are looking for someone who is excited about building innovative solutions and wants to have a large impact in a smaller company; you will be a key part of defining Unitary’s future during this early stage of our new product strategy. We need versatile people who are happy to get stuck into whatever needs doing, and are ready to learn and grow with the company.

For this particular role, we need a collaborative engineer who excels at working across teams and can translate complex technical concepts into actionable solutions. You should be comfortable balancing your time between fixing urgent issues and investing in proactive system improvements. Communication is crucial, as you'll be working closely with multiple engineers and may need to coordinate during high-stress incident situations.

We would love to hear from you if:

Have worked with visualisation tools such as Grafana for creating and maintaining dashboards that provide meaningful insights into system performance
Are proficient with metrics platforms such as Prometheus, InfluxDB, or OpenTelemetry for collecting and analysing system data
Have experience with incident management tools such as Incident.io for coordinating response efforts and recording follow-up learnings and actions
Can demonstrate strong problem-solving skills and the ability to work autonomously
Are confident in writing production code in languages such as Go or Python
Thrive in a collaborative environment where group output and team achievements weigh heavier than individual input

It would be even better, but not essential, if you have:

Experience working in a fully remote, international team
Previous startup experience
Built Slack bots or similar automation tools to streamline team workflows
Experience with CI/CD platforms for building reliable deployment pipelines (e.g. GitLab CI, ArgoCD)
Worked with Kubernetes and infrastructure as code tools such as Terraform for scalable system deployment
Are familiar with MLOps practices and tools, and monitoring machine learning systems in production

This role will report to the VP of Engineering and can be based anywhere within a 3-hour time zone of the UK.

About us

The team

Unitary is a remote-first team of c. 20 people spread across Europe and North America who are fiercely passionate about making the internet a safer place, and deeply motivated to become a force for good. We have an ambition to create a company filled with happy, kind and collaborative people who achieve extraordinary things together. Our culture is built around the power of trust, transparency and self-leadership.

Working at Unitary

We are committed to creating a positive and inclusive culture built on genuine interest in each other's well-being. We offer progressive and market-leading benefits, including:

Flexible hours and location
Competitive salary and equity package
Generous paid parental leave
Generous paid sick leave
Annual budget for your professional development and growth
Annual budget for your individual health and wellness
Three team offsites to London or other exciting destinations in Europe

Seniority level

Seniority level
Mid-Senior level

Employment type

Employment type
Full-time

Job function

Job function
Information Technology
Industries
Software Development

Referrals increase your chances of interviewing at Unitary by 2x

London, England, United Kingdom 3 days ago

City Of London, England, United Kingdom 1 day ago

London, England, United Kingdom 4 days ago

London, England, United Kingdom 3 weeks ago

London, England, United Kingdom 1 week ago

Site Reliability Engineer (Equity only 0.5%)

London, England, United Kingdom 2 weeks ago

London, England, United Kingdom 1 week ago

Senior Site Reliability Engineer - Midnight

London, England, United Kingdom 11 hours ago

London, England, United Kingdom 3 weeks ago

Wilmslow, England, United Kingdom 2 days ago

London, England, United Kingdom 3 weeks ago

London, England, United Kingdom 1 week ago

London, England, United Kingdom 2 months ago

London, England, United Kingdom 2 days ago

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Senior Site Reliability Engineer

Sheffield

Remote

GBP 55 000 - 85 000

4 days ago

Be an early applicant

Site Reliability Engineer

Unitary

United Kingdom

Remote

GBP 60,000 - 80,000

Full time

Job summary

Benefits

Qualifications

Responsibilities

Skills

Education

Tools

Job description

Similar jobs

Senior Site Reliability Engineer

Remote

GBP 70 000 - 100 000