Enable job alerts via email!

VP, Problem & Knowledge Management Lead, SRE & Governance, Group Technology

DBS Bank Limited

Singapore

On-site

SGD 120,000 - 180,000

Full time

Today

Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Start fresh or import an existing resume

Job summary

A leading financial institution is seeking a seasoned SRE Problem and Knowledge Management Team Lead to steer problem management operations, enhance SRE practices, and ensure a blameless culture. This role requires extensive experience in incident management and process improvement, aiming to lead a team in delivering high-quality service and reliable technology solutions.

Qualifications

Minimum 15 years in process improvement/RCA, leading discussions as a problem manager.
At least 10 years of software development or operations experience.
Basic knowledge of Linux, AIX, Solaris, and Windows.

Responsibilities

Mentor the team in root cause analysis (RCA) activities.
Lead facilitation for high-severity incidents.
Manage resources for effective problem management activities.

Skills

Process improvement

Root Cause Analysis

Incident Management

Communication

Troubleshooting

Stakeholder Management

Tools

JIRA

Confluence

Jenkins

SonarQube

Dynatrace

Grafana

Cloud Computing

The Role:

This position is for an SRE Problem and Knowledge Management Team Lead within the enabling group, Site Reliability Engineering and Governance (SRE & Governance) department.

This role is expected to strategically lead incident retrospective/problem management operations and other SRE activities related to maintenance management, including availability, performance, change management, monitoring, capacity planning, and solutions derived from emergency response.

The Team Lead must ensure that retrospective activities are effectively orchestrated and carried out, promoting a blameless culture in accordance with SRE principles.

Responsibilities:

Mentor the team in facilitating and conducting root cause analysis (RCA) activities from start to finish.
Lead facilitation for high-severity incidents, liaising with senior management and providing regular updates.
Present findings and action plans at RCA Forum, Tech Risk Forum, and other senior management meetings.
Rapidly absorb and effectively apply new technology.
Communicate clearly with both technical and non-technical colleagues.
Work to high standards within agreed timescales.
Perform any other reasonable tasks as requested by supervisors or senior management.
Manage resources to ensure effective problem management activities.
Provide platforms and channels to keep stakeholders updated on retrospectives and RCA activities.
Demonstrate authority during problem management calls.
Serve as the point of contact for high-severity incidents, from retrospective calls to Management Report documentation and publication.
Take accountability for initiatives to enhance SRE practices based on retrospectives.
Collaborate with Engineering Teams within SRE and with Lines of Business (LOBs) on preventive enabling activities.

Requirements:

Minimum 15 years of experience in process improvement/RCA, leading discussions as a problem manager or incident commander, preferably in Technology & Operations.
Experience with JIRA, Confluence, Jenkins, Nexus, SonarQube, Bitbucket, S3, and Cloud Computing.
Good exposure to logging and monitoring tools like Dynatrace, Prometheus, Grafana, ELK/ELK.
Deep understanding of Incident & Problem Management functions and activities, including hardware- and software-related issues.
Ability to work with stakeholders and command centers in troubleshooting, escalating, and resolving critical site incidents.
Identify recurring issues and collaborate with cloud, infrastructure, product development, vendors, and other stakeholders to investigate and resolve causes.
Maintain accurate incident documentation, including impact, timelines, and mitigation steps.
Strong verbal and written communication skills, especially for documentation.
At least 10 years of software development, technical support, or operations experience.
Basic knowledge of Linux, AIX, Solaris, and Windows.
Exposure to enterprise databases like Oracle, SQL Server, MariaDB, MongoDB, and Sybase.
Knowledge of systems, multi-tier applications, and network troubleshooting.
Awareness of Public/Private/Hybrid cloud solutions.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

VP, Problem & Knowledge Management Lead, SRE & Governance, Group Technology

DBS Bank Limited

Singapore

On-site

SGD 120,000 - 180,000