Enable job alerts via email!

Site reliability engineer

writer.com

London

On-site

GBP 70,000 - 100,000

Full time

7 days ago

Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading tech company seeks a Site Reliability Engineer to enhance the reliability and scalability of its cloud infrastructure. You will lead the design of systems ensuring high performance while mentoring junior engineers. Ideal candidates will have extensive experience in SRE, strong programming skills, and a passion for automation.

Benefits

Generous PTO and company holidays

Comprehensive medical and dental insurance

Paid parental leave for all parents (12 weeks)

Fertility and family planning support

Early-detection cancer testing through Galleri

Competitive pension scheme

Annual work-life stipends

Company-wide and team off-sites

Competitive compensation and stock options

Qualifications

At least 7 years of experience in Site Reliability Engineering.
Strong understanding of scalable system architecture.
Experience with containerization and orchestration technologies.

Responsibilities

Lead design and implementation of cloud infrastructure.
Automate infrastructure provisioning and management.
Develop monitoring and alerting systems.

Skills

Site Reliability Engineering

Infrastructure Design

Automation

Monitoring

Security Compliance

Python

Terraform

Docker

Kubernetes

Communication

Education

Bachelor’s degree in Computer Science or related field

Tools

Terraform

AWS

Azure

GCP

Prometheus

Grafana

ELK Stack

We are seeking a foundational member for the Cloud Infrastructure team at Writer. This role involves contributing to the development and implementation of our Site Reliability Engineering (SRE) program. The ideal candidate will ensure the reliability, scalability, performance, and security of Writer’s critical systems, proactively guaranteeing that our high-ROI products reach customers seamlessly.

Your responsibilities:

Lead the design, implementation, and maintenance of Writer, Inc.'s cloud infrastructure to ensure high availability and performance.
Design and implement scalable cloud automation to support seamless deployment for our largest enterprise customers.
Automate infrastructure provisioning and management using Terraform & Python.
Collaborate with development teams to optimize cloud resources and enhance system reliability.
Develop and maintain monitoring and alerting systems to proactively identify and resolve issues affecting system reliability.
Conduct post-mortem analyses of system failures to identify root causes and implement preventive measures.
Optimize and scale our cloud infrastructure to support growing user demand and ensure cost efficiency.
Ensure the security and compliance of our systems, adhering to industry standards and regulations.
Provide mentorship and technical guidance to junior engineers, fostering a culture of reliability and continuous improvement.
Stay current with emerging technologies and industry trends to improve our site reliability practices.

Is this you?

Proven expertise in Site Reliability Engineering with at least 7 years of hands-on experience.
Deep understanding of system architecture and infrastructure design for high availability and performance.
Bachelor’s degree in Computer Science, Engineering, or a related field.
Strong proficiency in programming languages such as Python, Java, or Go for automation and monitoring.
Experience with cloud platforms like AWS, Azure, or GCP, and their services for scalable, resilient systems.
Expertise in containerization technologies (e.g., Docker, Kubernetes) and orchestration tools.
Knowledge of monitoring and logging tools (e.g., Prometheus, Grafana, ELK Stack) for maintaining system health and performance.
Ability to lead and mentor junior engineers in reliability and system optimization best practices.
Excellent communication skills for effective collaboration with cross-functional teams and stakeholders.
Proactive in identifying and mitigating potential system failures and performance issues.

Preferred skills & experience:

Software engineering expertise.
Terraform.
Python.
Scala.
AWS/GCP.

Benefits & perks (UK full-time employees):

Generous PTO and company holidays.
Comprehensive medical and dental insurance.
Paid parental leave for all parents (12 weeks).
Fertility and family planning support.
Early-detection cancer testing through Galleri.
Competitive pension scheme and company contribution.
Annual work-life stipends for home office setup, cell phone, internet, wellness activities, and learning & development.
Company-wide and team off-sites.
Competitive compensation and stock options.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs