The primary purpose of the role is to coordinate and restore services (application and infrastructure) in the event of a major incident. This includes any Application and TS events that occur across the organization. The goal is to reduce critical outage times for significant occurrences and work proactively to prevent future outages.
Areas of Responsibility :
Service :
Service transition.
Service operation.
Incident and Problem Management :
Define the process and RACI to ensure proper response to incidents and problems.
Provide initial analysis for identifying the incident context and top potential root causes.
Provide mitigation plans to fully mitigate or minimize the effect of the incident.
Lead the analysis to identify the problem and root cause areas.
Oversee the postmortem study of problems, recommending changes / improvements to resolve the problem or note relevant risks on the risk register.
Configuration management architecture and content design.
Create relevant artifacts, define processes, RACI, SLAs, and measurements to guide the improvement process.
Managing :
Assist in managing high-severity incidents at a specialist level.
Ensure regular proactive measures are in place.
Manage and guide staff for optimal service delivery.
Manage escalated incidents and communicate with related teams.
Demonstrate a strong commitment to professional service delivery.
Ability to work in a high-pressure environment.
Customer Service :
Ensure service delivery to the business through monitoring of SLAs.
Meet business expectations.
Uphold the Incident Management image, dazzling customers.
Ensure team productivity and performance meet agreed standards.
Guidelines, Standards, and Reference Examples :
Participate in internal forums such as Support Services Work Group and lead work streams.
Contribute to methodology and standards.
Share knowledge acquired in the release process within the larger Support community.
Personal Attributes and Skills :
Cultivate innovation
Drive results
ITIL Process : Incident, Problem, and Configuration Management
Solid understanding of the systems development life cycle
8 to 10 years of domain knowledge
Business continuity
Incident trend analysis skills
Stress management
Analytical thinking
Personal organization and time management skills
Ability to coach and train people
Learning orientation
Excellent understanding of : Network technology and design, AD architecture, JVM architecture and monitoring, JEE and microservices architecture, API gateways, OS architecture, PKI, Kerberos, Distribution architecture and parallelization, Relational databases (ideally Oracle DB architecture), Containerization (Docker, Kubernetes), virtual machines
Reasonable understanding of VDI solutions purpose and design, Fibre Channel storage area networks
Education and Experience :
Matric / Grade 12.
ITIL Foundation Course or ITIL Incident Management Certification.
10 years of working experience in technology and system architecture.