Enable job alerts via email!

Senior Site Reliability Engineer - Glasgow

Caspian One Ltd

Glasgow

On-site

GBP 70,000 - 90,000

Full time

Today
Be an early applicant

Job summary

A leading technology firm in Scotland is seeking a Senior Site Reliability Engineer to shape SRE practices. You will engineer automation frameworks and elevate observability while driving the adoption of SRE principles. The role requires expertise in scripting, system performance, and cross-team collaboration. Join us to improve performance and reliability at scale.

Qualifications

  • Experience in automation through scripting in Python or Go.
  • Strong knowledge of system performance metrics and tuning.
  • Proficient in incident response strategies and system reliability.

Responsibilities

  • Ensure high availability and performance of services.
  • Lead incident response and preventative measures.
  • Develop automation tools and scripts to enhance efficiency.
  • Monitor systems to identify bottlenecks and optimize performance.
  • Collaborate with teams to enhance reliability within development life cycles.

Skills

System Reliability
Performance Optimization
Automation with Python or Go
Cross-Team Collaboration
Job description
Overview

We're hiring several Senior Site Reliability Engineers to help shape a Centre of Excellence for SRE practices across a global tech estate. This is a high-impact, hands on role where you'll engineer automation frameworks, elevate observability, and transform incident response at scale.

You'll be the go to expert guiding strategy, influencing culture, and driving adoption of SRE principles across diverse teams. From Scripting to architecting resilient systems, your technical leadership will directly improve performance, scalability, and availability.

Responsibilities
  • System Reliability & Performance: Ensure high availability, optimal performance, and scalability of services through proactive monitoring, maintenance, and capacity planning.
  • Incident Response & Prevention: Lead resolution and analysis of system outages. Implement preventative measures to reduce recurrence and improve system resilience.
  • Automation & Tooling: Develop scripts in Python or Go and tools to automate operational processes, reduce manual effort, and enhance efficiency.
  • Performance Optimization: Monitor system metrics, identify bottlenecks, and apply best practices for performance tuning and resource utilization.
  • Cross-Team Collaboration: Partner with development and infrastructure teams to embed reliability and scalability into the software development life cycle.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.