Enable job alerts via email!

VP, Problem & Knowledge Management Lead, SRE & Governance, Group Technology | Singapore, SG

DBS Bank Limited

Singapore

On-site

USD 150,000 - 220,000

Full time

Today
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading bank is seeking a VP, Problem & Knowledge Management Lead to spearhead SRE operations and incident management. The role requires extensive experience in process improvement and technical leadership. This is an opportunity to enhance the bank's technological governance and incident resolution strategies.

Qualifications

  • Minimum 15 years in process improvement and incident management.
  • Strong verbal & written communication skills.
  • In-depth knowledge of incident & problem management functions.

Responsibilities

  • Lead RCA activities and incident management.
  • Mentor team and collaborate with engineering teams.
  • Ensure documentation of incidents and resolutions.

Skills

Process improvement
Root cause analysis
Communication

Tools

JIRA
Confluence
Jenkins
Dynatrace
Grafana

Job description

VP, Problem & Knowledge Management Lead, SRE & Governance, Group Technology

VP, Problem & Knowledge Management Lead, SRE & Governance, Group Technology

The Role:

This position is for an SRE Problem and Knowledge Management Team Lead within the enabling group, Site Reliability Engineering and Governance (SRE & Governance) department.

This role is expected to strategically lead the conduct of incident retrospective/ problem management operations and in other SRE activities in general which pertains to maintenance management that includes availability, performance, change management, monitoring, capacity planning & also the solutions offered derived from emergency response.

The Team Lead is to make sure that the retrospective activities are orchestrated & carried out effectively while promoting the blameless culture in accordance with the SRE principles.

Responsibilities:

  • Mentor the team in the seamless facilitation & conduct of root cause analysis (RCA) activities from end to end
  • Lead the facilitation for high-severity incidents liaising with top/ senior management and keeping the latter updated
  • Prime focal point for presenting in the RCA Forum, Tech Risk Forum and other senior management meetings to report updates on retrospective findings & action plans
  • Absorb new technology rapidly & apply effectively
  • Communicate well with technical & non-technical colleagues
  • Work to a high standard with agreed timescales
  • Undertake any other tasks or duties that are reasonable & requested by the supervisor or a member of the senior management team.
  • Do resource management to ensure problem management activities are carried out in an effective and efficient manner
  • Provide available platforms and channels to ensure stakeholders are kept updated on results of retrospectives and RCA activities
  • Able to demonstrate authority in the problem management calls.
  • Point of contact for assigned incidents of higher severity (from incident retrospective calls all the way up to Management Report (MR) documentation and publishing
  • Take accountability for initiatives on the enhancement activities related to SRE as a result of retrospectives
  • Collaborates with Engineering Teams within SRE and with LOBs on enabling activities as part of the preventive measures
Requirements:
  • Minimum 15 years of process improvement/ root cause analysis (RCA) exposure & involvement leading discussions as a problem manager or incident commander, preferably in the Technology & Operations space
  • Experience with JIRA, Confluence, Jenkins, Nexus, SonarQube, Bit bucket, S3, Cloud Computing.
  • Good exposure to logging & monitoring tools like Dynatrace, Prometheus, Grafana, ELG/ELK
  • In depth understanding of Incident & Problem Management functions & activities (i.e. Hardware- & Software-related incident & problem management)
  • Work with stakeholders & command centre in trouble shooting, escalating & solutioning critical site incidents.
  • Identify recurring system/ application issues & work with cloud team, infra teams, product development, vendors & other stakeholders in investigating & resolving cause
  • Maintain accurate documentation of incidents including impact details, timelines, steps taken for mitigation/resolution.
  • Strong verbal & written communication skills particularly effective documentation skills
  • Min 10+ yrs of software development or technical support or operations experience.
  • Basic knowledge of Linux, AIX, Solaris and Windows
  • Exposure to Enterprise databases e.g Oracle, SQL server, Maria DB, MongoDB & Sybase.
  • Knowledge in systems & multi-tier application & network troubleshooting
  • Essential knowledge & awareness of Public/Private/Hybrid cloud solutions.

We would like to remind you that eFinancialCareers is a job board and does not conduct hiring or ask for payment or any financial details as part of the job application process.

If you receive any suspicious messages claiming to be from us or a hiring company, we urge you not to click on any links and not to reply to the message itself.

Instead, please report the message to our support team at support@efinancialcareers.com .

It is advisable to always verify job offers directly with the hiring company.

Boost your career
Find thousands of job opportunities by signing up to eFinancialCareers today.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.