Enable job alerts via email!

VP, Problem & Knowledge Management Specialist, SRE & Governance, Group...

DBS Bank Limited

Singapore

On-site

SGD 120,000 - 160,000

Full time

Yesterday

Be an early applicant

Job summary

A leading financial institution in Singapore is seeking a Team Lead for the SRE Problem and Knowledge Management team. This role requires over 15 years of experience in process improvement and incident management. The ideal candidate will mentor the team, lead high-severity incidents, and collaborate with engineering teams. Proficiency in tools like JIRA, Confluence, and cloud computing is essential. Strong communication skills and accountability in problem management are key to this position.

Qualifications

Minimum 15 years of experience in process improvement and root cause analysis.
Strong exposure to incident and problem management functions.
Min 10 years software development or technical support experience.

Responsibilities

Mentor the team in root cause analysis activities.
Lead facilitation for high-severity incidents.
Communicate effectively with technical and non-technical colleagues.

Skills

Process improvement

Root cause analysis

Incident management

Cloud Computing

Strong communication skills

Tools

JIRA

Confluence

Jenkins

Dynatrace

Prometheus

Grafana

Oracle

SQL Server

MongoDB

The Role

This position is for an SRE Problem and Knowledge Management Team Lead within the enabling group, Site Reliability Engineering and Governance (SRE & Governance) department.

This role is expected to strategically lead the conduct of incident retrospective/ problem management operations and in other SRE activities in general which pertains to maintenance management that includes availability, performance, change management, monitoring, capacity planning & also the solutions offered derived from emergency response.

The Team Lead is to make sure that the retrospective activities are orchestrated & carried out effectively while promoting the blameless culture in accordance with the SRE principles.

Responsibilities

Mentor the team in the seamless facilitation & conduct of root cause analysis (RCA) activities from end to end
Lead the facilitation for high-severity incidents liaising with top/ senior management and keeping the latter updated
Prime focal point for presenting in the RCA Forum, Tech Risk Forum and other senior management meetings to report updates on retrospective findings & action plans
Absorb new technology rapidly & apply effectively
Communicate well with technical & non-technical colleagues
Work to a high standard with agreed timescales
Undertake any other tasks or duties that are reasonable & requested by the supervisor or a member of the senior management team.
Do resource management to ensure problem management activities are carried out in an effective and efficient manner
Provide available platforms and channels to ensure stakeholders are kept updated on results of retrospectives and RCA activities
Able to demonstrate authority in the problem management calls.
Point of contact for assigned incidents of higher severity (from incident retrospective calls all the way up to Management Report (MR) documentation and publishing
Take accountability for initiatives on the enhancement activities related to SRE as a result of retrospectives
Collaborates with Engineering Teams within SRE and with LOBs on enabling activities as part of the preventive measures

Requirements

Minimum 15 years of process improvement/ root cause analysis (RCA) exposure & involvement leading discussions as a problem manager or incident commander, preferably in the Technology & Operations space
Experience with JIRA, Confluence, Jenkins, Nexus, SonarQube, Bit bucket, S3, Cloud Computing.
Good exposure to logging & monitoring tools like Dynatrace, Prometheus, Grafana, ELG/ELK
In depth understanding of Incident & Problem Management functions & activities (i.e. Hardware- & Software-related incident & problem management)
Work with stakeholders & command centre in trouble shooting, escalating & solutioning critical site incidents.
Identify recurring system/ application issues & work with cloud team, infra teams, product development, vendors & other stakeholders in investigating & resolving cause
Maintain accurate documentation of incidents including impact details, timelines, steps taken for mitigation/resolution.
Strong verbal & written communication skills particularly effective documentation skills
Min 10 yrs of software development or technical support or operations experience.
Basic knowledge of Linux, AIX, Solaris and Windows
Exposure to Enterprise databases e.g Oracle, SQL server, Maria DB, MongoDB & Sybase.
Knowledge in systems & multi-tier application & network troubleshooting
Essential knowledge & awareness of Public/Private/Hybrid cloud solutions.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

VP, Problem & Knowledge Management Specialist, SRE & Governance, Group...

DBS Bank Limited

Singapore

On-site

SGD 120,000 - 160,000

Full time

Job summary

Qualifications

Responsibilities

Skills

Tools

Company

Services

Free resources

Support

VP, Problem & Knowledge Management Specialist, SRE & Governance, Group...

DBS Bank Limited

Singapore

On-site

SGD 120,000 - 160,000

Full time

Job summary

Qualifications

Responsibilities

Skills

Tools

Follow us

Company

Services

Free resources

Support