Enable job alerts via email!

Operations Site Reliability Engineer

JR United Kingdom

Bristol

On-site

GBP 50,000 - 90,000

Full time

15 days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An established industry player is seeking an Operations Site Reliability Engineer to tackle demanding technical challenges in a fast-paced environment. This role is pivotal in ensuring the reliability and performance of critical production services, directly impacting thousands of users. You will work closely with engineering teams and external partners, employing your expertise in Linux systems, cloud platforms, and automation to solve complex problems. If you thrive under pressure and are passionate about innovation, this is your chance to make a significant impact in a rapidly evolving field.

Qualifications

  • 5+ years of experience administering Linux systems.
  • 2+ years operational experience with AWS or Google Cloud Platform.
  • Proficiency in at least one scripting language.

Responsibilities

  • Monitor and ensure the performance of production services.
  • Drive automation to reduce manual tasks and improve performance.
  • Create and maintain troubleshooting documentation.

Skills

Linux Administration
AWS
Google Cloud Platform
Automation Platforms
Ansible
Terraform
Scripting (Perl, Shell, Ruby, Python)
Networking Knowledge
Troubleshooting Skills

Education

Degree in Systems Engineering
Degree in Computer Science

Tools

Ansible Tower
Jenkins
Docker
Kubernetes

Job description

Social network you want to login/join with:

Operations Site Reliability Engineer, Bristol

Client: Recognition One

Location: Bristol, United Kingdom

Job Category: Other

EU work permit required: Yes

Job Views: 4

Posted: 08.05.2025

Expiry Date: 22.06.2025

Job Description:

Face a variety of demanding technical challenges across diverse disciplines, working directly with one of our largest and most influential clients to make a significant impact. This unique opportunity will unveil new possibilities in a rapidly evolving field. Are you an expert troubleshooter with a passion for innovation? This could be your chance.

The position is the last line of infrastructure support, far beyond technical customer support. It’s about solving the trickiest problems in the business that directly impact thousands of users within the largest global companies. You’ll often coordinate with product engineering and external partners like Google Cloud, as well as write automation and documentation to enable others to fix recurring problems.

Primary Responsibilities:
  • Be part of a critical operations team responsible for monitoring, availability, and performance of production services.
  • Respond to stakeholder requests within agreed timescales or SLOs.
  • Drive automation to reduce failures, manual tasks, and improve overall application performance and availability.
  • Perform systems administration activities to ensure smooth operation of applications across multiple platforms.
  • Coordinate and communicate with impacted stakeholders per incident management process.
  • Demonstrate ownership of events and incidents until resolution.
  • Conduct daily shift handovers to peers and management across multiple geographies.
  • Support maintenance activities impacting production applications.
  • Support critical systems handling sensitive and proprietary data.
  • Create, maintain, and update troubleshooting and support documentation.
  • Contribute to planning of application/infrastructure releases and configuration changes.
  • Administer and maintain all production environments.
  • Patching and upgrading existing applications.
  • Provide feedback and coaching to upstream teams (internal and vendors) to reduce escalations and improve customer experience.
Professional Experience Required:
  • A degree in Systems Engineering, Computer Science, or related fields with relevant experience preferred.
  • 5+ years of experience administering Linux systems.
  • Hands-on experience with various Linux distributions.
  • 2+ years operational experience with AWS or Google Cloud Platform.
  • Experience with automation platforms to automate repetitive tasks.
  • Familiarity with deployment tools such as Ansible Tower and Jenkins.
  • Experience deploying to large, global infrastructure.
  • Proficiency with orchestration/configuration tools like Ansible and Terraform.
  • Strong knowledge of networking, packet tracing, latency, and throughput issues.
  • Thorough understanding of HTTP(S), SMTP, TLS/SSL, DNS, LDAP, Kubernetes, and Docker.
  • Experience in system/application administration in high-availability, large-scale environments.
  • Proficiency in at least one scripting language such as Perl, shell, Ruby, or Python.
  • Experience tuning and optimizing monitoring systems.
Personal Skills:
  • A strong team player, quick to learn new technologies, adaptable, with a focus on delivery.
  • Excellent troubleshooting and problem-solving skills.
  • Ability to work calmly under pressure.
  • Interest in security.
  • Effective communicator at all organizational levels.

This role includes participation in weekend and holiday on-call support as required.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Operations Site Reliability Engineer

ZipRecruiter

Bristol

On-site

GBP 50,000 - 70,000

10 days ago

Site Reliability Engineer

JR United Kingdom

England

Remote

GBP 50,000 - 80,000

Today
Be an early applicant

Senior Site Reliability Engineer

NinjaOne

London

Remote

GBP 70,000 - 1,00,000

Today
Be an early applicant

Senior Site Reliability Engineer

JR United Kingdom

England

Remote

GBP 70,000 - 85,000

Today
Be an early applicant

Site Reliability Engineer – FinTech / Global Payments – London HQ / Remote First

JR United Kingdom

London Fields

Remote

GBP 50,000 - 80,000

Today
Be an early applicant

Site Reliability Engineer – FinTech / Global Payments – London HQ / Remote First

JR United Kingdom

City Of London

Remote

GBP 60,000 - 90,000

Today
Be an early applicant

Senior Site Reliability Engineer London, United Kingdom

NinjaOne, LLC

London

Remote

GBP 70,000 - 1,00,000

Today
Be an early applicant

Site Reliability Engineer - Core & Security (f/m/d)

cloudControl

Remote

GBP 50,000 - 80,000

2 days ago
Be an early applicant

Senior Site Reliability Engineer

General Motors

Remote

GBP 60,000 - 90,000

3 days ago
Be an early applicant