Enable job alerts via email!

Senior Manager, Site Reliability Engineering (SRE) – Digital Banking

BMO

Toronto

On-site

CAD 92,000 - 172,000

Full time

Yesterday
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading bank seeks a Senior Manager for Site Reliability Engineering to enhance its Digital Banking Platform. This role is vital for ensuring high performance and secure banking services to millions. The successful candidate will lead teams, drive improvements in service reliability, and contribute to strategic direction.

Benefits

Health Insurance
Tuition Reimbursement
Performance Bonuses

Qualifications

  • Typically 7+ years of experience in a relevant field.
  • Experienced in DevOps, Cybersecurity, and Cloud Computing.
  • Demonstrated leadership and excellent communication skills.

Responsibilities

  • Lead the Site Reliability Engineering and Infrastructure teams.
  • Oversee incident resolution and ensure high application performance.
  • Mentor a team of 8–10 SREs and establish best practices.

Skills

Troubleshooting
Observability
Monitoring
Incident Management
Communication

Education

Post-secondary degree in related field

Tools

DevOps
Cloud Computing
Incident Management

Job description

Senior Manager, Site Reliability Engineering (SRE) – Digital Banking

Join to apply for the Senior Manager, Site Reliability Engineering (SRE) – Digital Banking role at BMO

Role Overview

We are seeking a hands-on and strategic Senior Manager to lead our Site Reliability Engineering (SRE) and Infrastructure Patching teams supporting the Digital Banking Platform. This role is crucial to our mission of providing always-on, secure, and high-performing banking services for millions of customers.

Key Responsibilities
  • Provide strategic oversight for incident resolution efforts led by the SRE team, ensuring rapid restoration and comprehensive root cause analysis (RCA).
  • Collaborate across engineering, platform, and security teams to troubleshoot issues spanning full-stack environments (cloud, container, and legacy platforms).
  • Maintain high availability and performance of digital banking applications (primarily AWS, OpenShift, Linux, with some legacy WebSphere).
  • Champion proactive monitoring, observability, and alerting (e.g., Dynatrace, OpenSearch).
SRE & Reliability Engineering
  • Define and implement best practices for reliability, scalability, and availability tailored to large-scale digital banking.
  • Continuously improve CI/CD pipelines, release automation, and deployment practices.
  • Drive rigorous postmortem analysis and a culture of blameless continuous improvement.
  • Optimize for scalability, redundancy, and resilience—minimizing customer impact from incidents.
Infrastructure Patching
  • Oversee patching and maintenance for cloud and on-prem environments (AWS, OpenShift, Red Hat VMs, some WebSphere).
  • Ensure zero-downtime patching strategies and automation to mitigate operational risk and security vulnerabilities.
  • Partner with security teams to enforce compliance, harden platforms, and remediate vulnerabilities.
Reporting & Analytics
  • Provide strategic direction and oversight for reporting frameworks and analytics capabilities, ensuring actionable insights into platform reliability and operational performance.
  • Collaborate with teams to refine dashboards, metrics, and reporting tools that provide clear visibility for stakeholders and leadership.
  • Drive initiatives to improve data accuracy and alignment with organizational goals, ensuring reporting supports decision-making and strategic priorities.
Team Leadership & Process Improvement
  • Lead, mentor, and grow a high-performing team of 8–10 SREs.
  • Drive a culture of ownership, operational excellence, and continuous learning.
  • Establish and enforce best practices for incident management, operational documentation, and process automation.
  • Collaborate with development, infrastructure, and product teams to enhance observability, deployment, and proactive issue detection.
Required Skills
  • Hands-on troubleshooting skills in complex, distributed, or high-availability technical environments.
  • Experience in observability, monitoring, and incident management for critical platforms.
  • Demonstrated leadership in technical settings—may include leading projects, initiatives, or mentoring teams, even if not previously a formal people manager.
  • Strong ability to provide oversight and strategic direction for reporting and analytics frameworks, ensuring alignment with organizational goals.
  • Excellent communicator, able to translate technical detail for both engineers and executives.
Qualifications

Typically 7+ years of relevant experience and a post-secondary degree in a related field or an equivalent combination of education and experience. Proficiency levels include:

  • Intermediate: DevOps, Cybersecurity, Privacy Concepts, Emotional Agility
  • Advanced: IT Infrastructure Library, RPA, Cloud Computing, Configuration Management, Container Orchestration, System Design, Incident Management, Learning Agility, API Management, Automation Pipelines, Automated Testing, QA & Control, Data-Driven Decision Making
Salary & Benefits

$92,400.00 - $171,600.00, salaried, with potential performance incentives, bonuses, health insurance, tuition reimbursement, and other benefits. For more info, visit: Total Rewards

About Us

At BMO, our purpose is to create lasting, positive change. We value diversity, inclusion, and support your growth and impact from day one. Learn more at BMO Careers.

Note: BMO does not accept unsolicited resumes. Please work with our recruiters or apply directly through our website.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Senior Manager, Site Reliability Engineering

Wealth Enhancement Group

null null

Remote

Remote

USD 130,000 - 180,000

Full time

Yesterday
Be an early applicant

Senior Manager, Site Reliability Engineering

Precisely

null null

Remote

Remote

USD 120,000 - 180,000

Full time

3 days ago
Be an early applicant

Senior Manager, Site Reliability Engineering

Centene

null null

Remote

Remote

USD 119,000 - 221,000

Full time

7 days ago
Be an early applicant

Site Reliability Engineering (SRE) Manager, 1LMX MES COE

Lockheed Martin

Fort Worth null

Remote

Remote

USD 100,000 - 150,000

Full time

Today
Be an early applicant

[Hiring] Principal Site Reliability Engineer @Ashby

Ashby

null null

Remote

Remote

USD 120,000 - 180,000

Full time

2 days ago
Be an early applicant

Manager, Site Reliability Engineering

Centene

null null

Remote

Remote

USD 100,000 - 187,000

Full time

7 days ago
Be an early applicant

Lead Site Reliability Engineer

General Dynamics Information Technology

null null

Remote

Remote

USD 144,000 - 196,000

Full time

6 days ago
Be an early applicant

Lead Site Reliability Engineer

Hilton Worldwide, Inc.

Addison null

Remote

Remote

USD 125,000 - 140,000

Full time

2 days ago
Be an early applicant

Senior Manager Site Reliability Engineering (Kubernetes)- Remote

Akamai Technologies

null null

Remote

Remote

USD 155,000 - 324,000

Full time

30+ days ago