Press Tab to Move to Skip to Content Link
Select how often (in days) to receive an alert: Create Alert
The Incident and Problem Manager is responsible for the end-to-end management of the Incident and Problem Management processes to ensure minimal disruption to business operations and to improve service quality. The role is crucial in a highly regulated banking environment to support operational resilience, continuous improvement, and compliance with internal and external governance frameworks. This role requires strong leadership, analytical skills, and the ability to coordinate effectively across IT teams to manage incident resolution and drive root cause analysis for permanent fixes.
Duties and Responsibilities
- Own the end-to-end Incident Management process, ensuring consistent handling across all IT functions.
- Lead and coordinate responses to high-impact incidents, including mobilization of resolution teams and communication with stakeholders.
- Drive incident triage, categorization, prioritization, and timely escalation according to defined SLAs.
- Lead Major Incident Management (MIM) calls and ensure structured updates to senior stakeholders and impacted business units.
- Analyze incident trends to drive service improvement and reduce recurrence.
- Ensure accurate and timely documentation of all incidents, including incident timelines, actions taken, communications, impact assessments, and resolution steps.
- Maintain an incident log for audit and reporting purposes, aligned with internal governance and regulatory expectations.
- Own and maintain the Problem Management process and documentation in alignment with ITIL best practices.
- Identify root causes for recurring and significant incidents using structured methodologies such as 5 Whys, Fishbone (Ishikawa), or other techniques.
- Organize and lead cross-functional technical review meetings for problem investigation to drive toward permanent solutions.
- Maintain and manage the Known Error Database (KEDB) and validate temporary workarounds or fixes.
- Collaborate with Change Management to ensure corrective actions are implemented with minimal risk.
- Drive trend analysis using incident data to proactively identify areas of improvement and risk.
- Develop and maintain comprehensive documentation for problems, including problem records, root cause analysis reports, known error records, and workaround procedures.
- Ensure Problem Management documentation supports audit, compliance, and knowledge sharing objectives.
Process Integration & Governance
- Ensure effective integration with key ITSM processes such as:
- Change & Release Management
- Business Continuity & Disaster Recovery (BCP/DR)
- Capacity and Availability Management
Drive post-incident reviews (PIR) and lessons learned sessions to ensure knowledge is retained and action plans are executed.
Provide periodic reports and dashboards on incident and problem trends, root causes, and improvement initiatives to stakeholders and auditors.
Contribute to audit readiness and compliance with regulatory frameworks such as BNM RMiT, ISO 27001, and ITIL.