Job Search and Career Advice Platform

Enable job alerts via email!

Site Reliability Engineer (Remote)

Rullion Ltd

Remote

GBP 100,000 - 125,000

Full time

Yesterday
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A UK-based technology firm is looking for a Site Reliability Engineer to design, implement, and maintain scalable infrastructures. The ideal candidate will have proven experience in Site Reliability Engineering or DevOps and strong skills in cloud platforms and automation. This role is essential for enhancing system reliability and performance in a collaborative environment, ensuring mission-critical projects are delivered effectively.

Qualifications

  • Proven experience in Site Reliability Engineering, DevOps, or similar roles.
  • Strong understanding of cloud platforms and containerisation technologies.
  • Proficiency in scripting languages like Python, Bash, or Go.
  • Hands-on experience with monitoring and observability tools.

Responsibilities

  • Design, implement, and maintain scalable infrastructure and services.
  • Develop automation scripts to improve reliability.
  • Monitor and troubleshoot system performance.

Skills

Site Reliability Engineering
Cloud platforms (AWS, Azure, GCP)
Containerisation technologies (Kubernetes, Docker)
Scripting languages (Python, Bash, Go)
Monitoring tools (Prometheus, Grafana, ELK)
Infrastructure-as-code tools (Terraform, Ansible)
Networking concepts
Problem-solving skills
Job description
Key Responsibilities:
  • Design, implement, and maintain scalable, highly available infrastructure and services.
  • Develop automation scripts and tools to improve system reliability and operational efficiency.
  • Monitor and troubleshoot system performance, identifying and resolving issues to minimise downtime.
  • Implement and maintain CI/CD pipelines to support efficient software delivery.
  • Develop and enforce best practices for security, monitoring, and incident management.
  • Collaborate with development teams to enhance application performance and stability.
  • Create detailed documentation and conduct post-incident reviews to identify root causes and implement long-term solutions.
Essential Skills and Experience:
  • Proven experience in Site Reliability Engineering, DevOps, or similar roles.
  • Strong understanding of cloud platforms (AWS, Azure, or GCP) and containerisation technologies (Kubernetes, Docker).
  • Proficiency in scripting languages such as Python, Bash, or Go.
  • Hands‑on experience with monitoring and observability tools like Prometheus, Grafana, and the ELK stack.
  • Familiarity with infrastructure-as-code tools like Terraform or Ansible.
  • Solid understanding of networking concepts and system security best practices.
  • Excellent problem‑solving skills and a passion for automation and continuous improvement.
Desirable:
  • Certifications in cloud platforms or DevOps tools.
  • Experience with large-scale distributed systems.

This role offers the opportunity to work on mission‑critical projects in a fast‑paced and collaborative environment, driving innovation and reliability in our technology ecosystem.

Rullion celebrates and supports diversity and is committed to ensuring equal opportunities for both employees and applicants.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.