Enable job alerts via email!

Senior Manager, Site Reliability Engineering (SRE) – Digital Banking

BMO

Toronto

On-site

CAD 92,000 - 172,000

Full time

Yesterday

Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading bank seeks a Senior Manager for Site Reliability Engineering to enhance its Digital Banking Platform. This role is vital for ensuring high performance and secure banking services to millions. The successful candidate will lead teams, drive improvements in service reliability, and contribute to strategic direction.

Benefits

Health Insurance

Tuition Reimbursement

Performance Bonuses

Qualifications

Typically 7+ years of experience in a relevant field.
Experienced in DevOps, Cybersecurity, and Cloud Computing.
Demonstrated leadership and excellent communication skills.

Responsibilities

Lead the Site Reliability Engineering and Infrastructure teams.
Oversee incident resolution and ensure high application performance.
Mentor a team of 8–10 SREs and establish best practices.

Skills

Troubleshooting

Observability

Monitoring

Incident Management

Communication

Education

Post-secondary degree in related field

Tools

DevOps

Cloud Computing

Incident Management

Senior Manager, Site Reliability Engineering (SRE) – Digital Banking

Join to apply for the Senior Manager, Site Reliability Engineering (SRE) – Digital Banking role at BMO

Role Overview

We are seeking a hands-on and strategic Senior Manager to lead our Site Reliability Engineering (SRE) and Infrastructure Patching teams supporting the Digital Banking Platform. This role is crucial to our mission of providing always-on, secure, and high-performing banking services for millions of customers.

Key Responsibilities

Provide strategic oversight for incident resolution efforts led by the SRE team, ensuring rapid restoration and comprehensive root cause analysis (RCA).
Collaborate across engineering, platform, and security teams to troubleshoot issues spanning full-stack environments (cloud, container, and legacy platforms).
Maintain high availability and performance of digital banking applications (primarily AWS, OpenShift, Linux, with some legacy WebSphere).
Champion proactive monitoring, observability, and alerting (e.g., Dynatrace, OpenSearch).

SRE & Reliability Engineering

Define and implement best practices for reliability, scalability, and availability tailored to large-scale digital banking.
Continuously improve CI/CD pipelines, release automation, and deployment practices.
Drive rigorous postmortem analysis and a culture of blameless continuous improvement.
Optimize for scalability, redundancy, and resilience—minimizing customer impact from incidents.

Infrastructure Patching

Oversee patching and maintenance for cloud and on-prem environments (AWS, OpenShift, Red Hat VMs, some WebSphere).
Ensure zero-downtime patching strategies and automation to mitigate operational risk and security vulnerabilities.
Partner with security teams to enforce compliance, harden platforms, and remediate vulnerabilities.

Reporting & Analytics

Provide strategic direction and oversight for reporting frameworks and analytics capabilities, ensuring actionable insights into platform reliability and operational performance.
Collaborate with teams to refine dashboards, metrics, and reporting tools that provide clear visibility for stakeholders and leadership.
Drive initiatives to improve data accuracy and alignment with organizational goals, ensuring reporting supports decision-making and strategic priorities.

Team Leadership & Process Improvement

Lead, mentor, and grow a high-performing team of 8–10 SREs.
Drive a culture of ownership, operational excellence, and continuous learning.
Establish and enforce best practices for incident management, operational documentation, and process automation.
Collaborate with development, infrastructure, and product teams to enhance observability, deployment, and proactive issue detection.

Required Skills

Hands-on troubleshooting skills in complex, distributed, or high-availability technical environments.
Experience in observability, monitoring, and incident management for critical platforms.
Demonstrated leadership in technical settings—may include leading projects, initiatives, or mentoring teams, even if not previously a formal people manager.
Strong ability to provide oversight and strategic direction for reporting and analytics frameworks, ensuring alignment with organizational goals.
Excellent communicator, able to translate technical detail for both engineers and executives.

Qualifications

Typically 7+ years of relevant experience and a post-secondary degree in a related field or an equivalent combination of education and experience. Proficiency levels include:

Intermediate: DevOps, Cybersecurity, Privacy Concepts, Emotional Agility
Advanced: IT Infrastructure Library, RPA, Cloud Computing, Configuration Management, Container Orchestration, System Design, Incident Management, Learning Agility, API Management, Automation Pipelines, Automated Testing, QA & Control, Data-Driven Decision Making

Salary & Benefits

$92,400.00 - $171,600.00, salaried, with potential performance incentives, bonuses, health insurance, tuition reimbursement, and other benefits. For more info, visit: Total Rewards

About Us

At BMO, our purpose is to create lasting, positive change. We value diversity, inclusion, and support your growth and impact from day one. Learn more at BMO Careers.

Note: BMO does not accept unsolicited resumes. Please work with our recruiters or apply directly through our website.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Senior Manager, Site Reliability Engineering (SRE) – Digital Banking

BMO

Toronto

On-site

CAD 92,000 - 172,000

Full time

Job summary

Benefits

Qualifications

Responsibilities

Skills

Education

Tools

Job description

Similar jobs