Role Overview
We are seeking an experienced Incident Manager / Operations Manager with strong technical and communication skills to manage and lead Sev-1 incidents on a 24x7 standby rotation. This role requires a candidate who is proactive, detail-oriented, and able to coordinate quickly across multiple teams to restore services and document incidents professionally.
The technology stack is primarily .NET, with supporting components involving web services, REST APIs, batch jobs, and workloads hosted on GCC/AWS environments.
Key Responsibilities
Incident Management & Operations
- Serve as the primary on-call contact for Sev-1 incidents (24x7 standby; typically 1–2 activations per month).
- Manage end-to-end incident lifecycle from detection to resolution and closure.
- Coordinate with development, infrastructure, cloud, and business teams to restore services as quickly as possible.
- Provide clear, timely communication and status updates during incident handling.
- Lead technical bridge calls and ensure engagement of all required stakeholders.
- Track incident progress, monitor SLAs, and escalator issues proactively.
Technical Troubleshooting
- Understand and troubleshoot issues on applications built using .NET technologies.
- Analyse failures involving web services, REST APIs, batch processes, and integrations.
- Review application and server logs, identify root causes, and coordinate permanent fixes with engineering teams.
- Work with cloud teams on issues related to GCC/AWS infrastructure, including connectivity, performance, and service failures.
Reporting & Documentation
- Prepare incident reports, including timeline, technical details, corrective actions, and preventive recommendations.
- Maintain accurate incident records, including RCA (Root Cause Analysis) and follow-up actions.
- Document recurring issues and propose long-term improvements to reduce incident volume and severity.
Required Skills & Experience
Must‑Have Technical Skills
- Minimum 4+ years of experience in IT Operations, Incident Management, or Application Support.
- Strong understanding of .NET applications, web services, REST APIs, and backend batch jobs.
- Experience troubleshooting cloud‑hosted systems (preferably GCC or AWS).
- Ability to analyse logs, server metrics, and technical alerts.
Incident & Operations Skills
- Prior experience in Sev‑1 / P1 incident handling.
- Ability to lead war‑room calls and coordinate with multi‑disciplinary teams.
- Strong problem‑solving mindset with a calm and structured approach under pressure.
- Experience handling on‑call rotations and after‑hours support.
Communication & Reporting Skills
- Excellent written and verbal communication skills.
- Ability to document event flows, timelines, and actions clearly.
- Strong stakeholder engagement and ability to communicate with technical and non‑technical teams.