Enable job alerts via email!

Site Reliability Engineer

TP ICAP

Greater London

On-site

GBP 60,000 - 80,000

Full time

Today

Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A financial services company based in Greater London seeks a Site Reliability Engineer to ensure the uptime and performance of Global Analytics services. You will collaborate with software engineering and QA teams to maintain CI/CD pipelines, manage incidents, and reduce repetitive tasks. The ideal candidate has solid experience with financial trading systems, AWS, and relevant tools like Grafana and Docker. This role requires a degree level education and a focus on long-term results in a critical production environment.

Qualifications

Educated to degree level or equivalent combination of education and experience.
Solid experience working with financial trading systems.
Good understanding of high-level Networking systems (e.g. firewalls, load-balancers).
Experience working with cloud platforms, preferably AWS, with Kubernetes and Docker.
Experience working with monitoring and observability tools such as Grafana and Prometheus.
Knowledge of CI/CD pipeline tools and Infrastructure as Code tools.
Scripting and Automation experience, ideally with Python and PowerShell.
Experience of application performance profiling tools.
Highly analytical, focused on long-term results and delivery.

Responsibilities

Ensure uptime, availability, and performance of Global Analytics services.
Define and monitor Service Level Objectives (SLOs) and Service Level Indicators (SLIs).
Respond to incidents and outages working with engineering teams.
Prevent service disruption through proactive alerts.
Work with engineering to reduce repetitive tasks.
Build and maintain internal tools to improve productivity.
Implement and maintain logging, metrics, and tracing systems.
Plan for scaling capacity and infrastructure needs.
Ensure compliance with departmental policies.
Collaborate to maintain and improve CI/CD pipelines.
Collaborate with QA for safe software releases.
Ensure systems are secure and meet compliance standards.

Skills

Financial trading systems

High-level Networking systems

AWS

Kubernetes

Docker

Monitoring tools (Grafana, Prometheus)

CI/CD pipeline tools (Gitlab)

Infrastructure as Code tools (Terraform)

Scripting (Python, PowerShell)

Education

Degree level or equivalent experience

Tools

Grafana

Prometheus

Gitlab

Terraform

Role Overview

The Global Analytics team is responsible for developing and maintaining Price Discovery solutions used by the Front Office to generate and disseminate market information to clients. This data and associated financial calculations are integrated into a range of applications across the firm. As the Site Reliability Engineer, you will play a critical role in ensuring the availability, reliability, and performance of our production environment applications bridging the gap between the software and operations engineering teams.

Role Responsibilities

Ensure uptime, availability, and performance of Global Analytics services

Define and monitor Service Level Objectives (SLOs) and Service Level Indicators (SLIs)

Respond to incidents and outages working with the Software and Operations engineering teams to quickly resolve

Respond to application and infrastructure alerts to prevent service disruption

Work with the Software Engineering team to reduce repetitive tasks such as deployments and monitoring

Build and maintain internal tools to improve developer productivity

Implement and maintain logging, metrics and tracing systems with alignment to Global Architecture best practices

Plan for scaling capacity, forecasting future infrastructure needs

Ensure compliance with departmental policies (i.e. change management, IT security standards, release management, incident management)

Collaborate with Software Engineering team to maintain and improve continuous integration and deployment pipelines

Collaborate with QA team to ensure safe and reliable software releases

Ensure that systems are secure and satisfy compliance requirements to meet industry standards and regulatory requirements

Experience / Competences

Educated to degree level or equivalent combination of education and experience

Solid experience working with financial trading systems

Good understanding of high-level Networking systems (e.g. firewalls, load-balancers, etc.)

Experience working with cloud platforms, preferably AWS, with Kubernetes and Docker

Experience working with monitoring and observability tools such as Grafana and Prometheus

Knowledge of CI / CD pipeline tools such as Gitlab and Infrastructure as Code (IaC) tools like Terraform

Scripting and Automation experience, ideally with Python and PowerShell

Experience of application performance profiling tools

Highly analytical, focus on long-term results and delivery

Job Band & Level

Professional / Level 5

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Top cities

Top companies

Popular jobs