Role Purpose
Acts as the primary responder and driver for all major incidents within the organisation. This role ensures timely service restoration by engaging and managing multi-disciplinary support teams and vendors. While deep hands‑on diagnostics are not required, the Incident and Service Management Specialist must possess good technical understanding to interpret issues accurately, challenge vendor assessments, and uphold service standards.
Key Responsibilities
Major Incident Management
- Serve as the first point of engagement for all major incidents.
- Initiate war rooms and coordinate all support towers (Infrastructure, Network, Applications, Cloud, EUC).
- Drive incident restoration efforts and ensure rapid recovery of services.
- Understand technical symptoms, articulate probable impact paths, and direct teams effectively.
- Challenge vendor diagnoses and ensure alignment with SLAs/OLAs.
- Trigger escalation protocols for vendor delays or SLA risks.
- Support PIR (Post‑Incident Review) and contribute to root cause analysis.
- Ensure complete documentation of incidents, timelines, vendor performance, and intermediate fixes.
- Track and report all improvement actions from PIR until closure.
Communication & Coordination
- Provide clear, structured, and timely updates to internal stakeholders throughout the incident lifecycle.
- Coordinate with the Security Operations Centre (SOC) for cybersecurity‑related incidents.
Governance & Continuous Improvement
- Uphold Government ICT policies, IM8 guidelines, PDPR and other security policies and standards.
- Ensure adherence to ITIL service management processes in the areas of Incident, Problem, and Service Level and Change Management.
Required Skills and Competencies
Technical Skills
- Infrastructure: servers, Operating Systems, virtual machines, storage technologies
- Network: routing, switching, firewalls, load balancers
- Applications: web, API, middleware, modern application stacks
- Cloud: Platforms: AWS, Azure
- EUC/VDI: endpoint and virtual desktop technologies
- Familiarity with monitoring, SIEM, and APM tools
- Proficiency in ITSM platforms such as ServiceNow or Remedy
Process Skills
- Strong knowledge of ITIL processes.
- Experience running multi‑vendor war rooms and driving SLA accountability.
- Understanding of escalation protocols and structured communication frameworks.
Behavioral Competencies
- Excellent communication and crisis management composure.
- Strong analytical and situational awareness skills.
- Ability to lead without authority across technical teams during high‑impact incidents.
- Confident stakeholder management at both technical and managerial levels.
Qualifications & Experience
Education / Certifications
- Diploma or Degree in IT, Computer Science, Software Engineering, or related field.
- CISSP or ITIL Foundation certification is an advantage.
Optional but advantageous
- Certification in application monitoring tools (e.g., Dynatrace Associate, New Relic, AppDynamics).
- Basic DevOps‑related certifications (e.g., AWS Developer Associate, Azure Developer Associate) if supporting modern apps.
- Firewall vendor certifications (Fortinet NSE, Palo Alto ACE)
Experience
- 1–3 years’ experience in incident management, client success manager, NOC/SOC, or operations
- Experience working in client, multi‑vendor, multi‑technology environments.
Technical Competency
- Understanding of application components (frontend, backend, API gateway, middleware).
- Familiarity with common application issues (timeouts, dependency failures, code exceptions).
- Ability to interpret basic stack traces or error messages (not hands‑on development).
- Ability to interpret common cloud issues (latency, resource limits, region outages).
- Familiarity with cloud networking concepts (VPC, NSG, load balancing).
- Knowledge of Windows/Mac systems, AD authentication, group policies.