This role operates the end-to-end process, drives timely resolution of incidents, root cause analysis, and continuous improvement initiatives. Additionally, the role works closely with a 24x7 Command Center operation, ensuring continuous monitoring, rapid response, and operational excellence. As a leader in both operational IT service operations supporting digital transformation, this role supports and champions AI adoption, process optimization, and shift‑left strategies to enhance service delivery while reducing manual overhead whilst maintaining operational stability.
Responsibilities
- Runs the incident management process to ensure rapid restoration of services.
- Coordinate major incident response, including communication with stakeholders and escalation management.
- Ensure adherence to SLAs and KPIs for incident resolution.
- Maintain accurate incident records and reporting.
- Drive root cause analysis for recurring incidents and major problems.
- Provide oversight to permanent fixes and preventive measures.
- Maintain the knowledgebase of problems and ensure effective knowledge sharing.
- Collaborate with engineering and operations teams to reduce recurring incident volume.
- Review incident trends for preventive measures to incident occurrence.
- Define and enforce incident and problem management policies and procedures, ensuring annual review is performed.
- Monitor process performance and identify improvement opportunities.
- Provide training and guidance to teams and partners on best practices.
- Prepare and present regular reports to senior management.
- Implement shift‑left strategies to streamline infra operations responses to common alerts and incidents.
- Act as the point of contact for audits related to incident and problem management.
- Act as the escalation point of contact for incident and problem management.
- Communicate effectively with business units, vendors, and leadership during critical events.
- Ensure transparency and timely updates throughout the incident lifecycle, including post‑incident reporting to Group Risk Management.
- Takes accountability in considering business and regulatory compliance risks and takes appropriate steps to mitigate the risks.
- Maintains awareness of industry trends on regulatory compliance, emerging threats and technologies to understand the risk and better safeguard the company.
- Highlights any potential concerns or risks and proactively shares best risk‑management practices.
Qualifications
- A Bachelor’s or Professional Degree in IT, Computer Science.
- 10+ years in IT Service Management, with expertise in Incident and Problem Management and more than 5 years in a leadership role.
- Hands‑on experience with major service‑management platforms like ServiceNow or equivalent.
- Strong understanding and demonstration of ITIL processes, particularly Incident and Problem Management.
- Experience working with 24x7 operations and shift‑based teams.
- Familiarity with AIOps, shift‑left, and self‑healing IT concepts.
- Knowledge of generative AI for IT operations (e.g., automated change plans, anomaly detection).
- Experience working with multiple teams (business, IT application and infrastructure teams) for service management and operations.
- Planning and organisational skills for undertaking team leadership and BAU workload management.
- Self‑driven with a positive attitude and strong leadership skills to champion culture‑change management initiatives within IT and with interfacing business stakeholders.
- Strong and factual communication skills and able to manage situations/conflicts and influence others.
- Good analytical skills and a creative approach to problem solving and provide options and solutions.
- Experience with budget management and vendor management.
- Excellent leadership and stakeholder management abilities.
- High level of integrity, takes accountability of work and good attitude over teamwork.
- Takes initiative to improve current state of things and adapts to embrace new changes.