Enable job alerts via email!

COE Lead - Observability & Tooling

TN United Kingdom

Bury

On-site

GBP 60,000 - 90,000

Full time

Today
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An established industry player is seeking a CoE Lead for Observability & Tooling to enhance their technical capabilities. This pivotal role involves designing and maintaining an Observability platform, ensuring seamless collaboration across DevOps and Engineering teams. You will lead the development of intelligent alerts and dashboards, enabling proactive monitoring and incident resolution. With a focus on automation and contract management, you will drive improvements in service reliability and efficiency. Join a forward-thinking team where your expertise will significantly impact the organization's operational success.

Qualifications

  • 5-8 years in technology service delivery with a focus on observability.
  • Experience managing third-party provider contracts and SLAs.

Responsibilities

  • Design and maintain the Observability platform for efficient operations.
  • Collaborate with teams to automate incident detection and resolution.

Skills

Leadership and Collaboration
Communication Skills
Technical Expertise in Observability Tools
Cloud Environments (AWS, Azure)
Monitoring and Observability
Contract Management

Tools

Prometheus
Grafana
NewRelic
Terraform
Jira Service Management

Job description

Social network you want to login/join with:

COE Lead - Observability & Tooling, Bury

col-narrow-left

Client:

JD Group

Location:

Bury, United Kingdom

Job Category:

Other

-

EU work permit required:

Yes

col-narrow-right

Job Reference:

2900839ac228

Job Views:

6

Posted:

05.05.2025

Expiry Date:

19.06.2025

col-wide

Job Description:

The CoE Lead - Observability & Tools at JD Sports Fashion Plc is a critical, hands-on technical role focused on designing, building, and maintaining the company's Observability platform. The role ensures that our technology platforms operate efficiently and reliably, providing early insights for Engineering, Service Reliability, Service Delivery, and DevOps teams.

The CoE Lead will manage the contract with third-party providers responsible for the execution layer, ensuring adherence to service-level agreements (SLAs) and key performance indicators (KPIs). The position involves a 75% focus on the design of frameworks and a 25% focus on implementation and adoption.

Job Title – Centre Of Excellence Lead- Observability & Tooling

Working hours – 40

What You'll Be Doing:

We are looking for an experienced CoE Lead to design, build, and maintain our Observability platform. The CoE Lead will work closely with DevOps, Engineering, Service Reliability, and Service Delivery teams to continuously improve our Observability capabilities.

This role is a technical, hands-on position with a 75% focus on framework design and 25% on implementation and adoption.

You will contribute to pipeline design, enabling observability from the first deployment in test environments and providing early insights for Engineering, Service Reliability, Service Delivery, and DevOps teams. The role involves building frameworks for intelligent alerts to help Service Delivery teams quickly triage incidents and enable automated runbooks. Additionally, you will identify and deploy tools to automate incident detection, notifications, triage, and resolution.

Key Responsibilities:

  • Pipeline Approach: Adopt a pipeline approach to enable observability of services deployed across multiple environments, balancing monitoring, logging, and tracing based on service classification.
  • Intelligent Alerts: Design and build intelligent alerts using pipelines, onboarding automated runbooks triggered with clear audit/logs in service management tools like Jira Service Management.
  • Dashboards: Create and maintain dashboards for proactive monitoring of services to help teams resolve incidents quickly.
  • Monitoring Capability: Continuously improve monitoring capabilities to identify key alerts and thresholds for early warnings before services fail.
  • Automation: Enable intelligent alerts with fine-grained details of underlying services causing issues, extending to trigger automated execution of runbooks with clear audit logs.
  • Collaboration: Work closely with DevOps, Service Reliability, and Service Delivery teams to identify and deploy tools that automate incident detection, notifications, triage, and resolution.

What We're Looking For:

Skills:

  • Leadership and Collaboration: Strong leadership skills with the ability to mentor, coach, and develop high-performing teams.
  • Excellent communication and interpersonal skills, capable of building strong relationships with both technical and business stakeholders.
  • Proven ability to collaborate effectively with cross-functional teams, including DevOps, Engineering, Service Reliability, and Service Delivery teams.
  • Technical Expertise: In-depth knowledge of open-source and commercial observability tools (e.g., Prometheus, Grafana, NewRelic).
  • Expertise in cloud environments (e.g., AWS, Azure) and infrastructure as code (IaC) tools like Terraform.
  • Monitoring and Observability: Experience in creating and maintaining dashboards for proactive monitoring of services.
  • Ability to design and build intelligent alerts using pipelines, enabling early detection of issues and automated incident response.
  • Knowledge of the latest technology trends in the monitoring landscape, such as OpenTelemetry.
  • Contract Management: Experience in managing third-party provider contracts, including negotiating terms, monitoring performance, and ensuring adherence to SLAs and KPIs.
  • Ability to integrate third-party providers seamlessly into the organisation's workflows, aligning with the overall strategic vision.

Experience:

  • Professional Experience: Minimum of 5-8 years of experience in technology service delivery and management, focusing on observability, monitoring, and tooling.
  • Service Management: Practical experience in building and maintaining a Service Catalogue, assigning service level objectives (SLOs), and measuring service level indicators (SLIs).
  • Experience in operating production services during peak trading periods without service degradation.
  • Automation and Tooling: Knowledge of automation tools to simplify alert notifications and extend to automated runbook execution.
  • Experience in implementing observability solutions for retail stores or similar environments.

Proven experience in overseeing and managing Atlassian tools for effective tracking, collaboration, and service management

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

COE Lead - Observability & Tooling

JD Sports Fashion

Bury

On-site

GBP 45,000 - 85,000

29 days ago

Coe Lead - Observability & Tooling

JD GROUP

Bury

On-site

GBP 40,000 - 80,000

30+ days ago