Enable job alerts via email!

Incident Manager

Airtel Africa

City Of London

Hybrid

GBP 65,000 - 85,000

Full time

Today

Be an early applicant

Job summary

A leading telecommunications company in the City of London seeks an experienced Incident Manager to oversee technical incident responses. The role requires deep AWS expertise, strong analytical abilities in cloud-native environments, and excellent collaboration skills. You will lead cross-functional teams, monitor service health, and manage incidents, ensuring minimal business impact. Competitive compensation and potential benefits offered.

Qualifications

Proven experience in Incident Management or similar roles.
Strong understanding of web architecture and microservices.
Deep hands-on experience with AWS Cloud.

Responsibilities

Serve as the primary technical point of contact during incidents.
Lead and coordinate cross-functional incident response teams.
Monitor service health using tools like CloudWatch and Grafana.

Skills

AWS Cloud Services

Incident Management

Linux/Unix Administration

Agile Methodologies

Scripting (Python, Bash)

Monitoring Tools (CloudWatch, Grafana)

Debugging Skills

Leadership

Tools

Jira

Confluence

Git

Jenkins

Responsibilities

Serve as the primary technical point of contact during critical incidents, ensuring rapid resolution and minimal business impact.
Lead and coordinate cross-functional teams (engineering, support, operations) during incident response, including root cause analysis, mitigation strategies, and post-mortem reviews.
Monitor service health using tools such as CloudWatch, OpenSearch, Kibana, Grafana, and proactively identify potential issues before they impact customers.
Troubleshoot and debug production issues in web architecture, microservices, and cloud environments.
Manage and maintain system reliability by implementing best practices in observability, monitoring, and alerting.
Collaborate closely with Software Development, Infrastructure, and Operations teams to improve incident response processes and system resilience.
Manage incidents related to AWS services such as EC2 S3 RDS, DynamoDB, Aurora, Redis, Memcache, Kafka, SNS, SQS, OpenSearch, and Elasticsearch.
Use Agile tools (Jira, Confluence) to track incident tickets, document resolutions, and maintain a clear audit trail.
Oversee system and application deployments, supporting automation pipelines (Jenkins, Git).
Perform Linux/Unix administration tasks as needed during incident investigation and resolution.
Continuously update and refine incident response playbooks, runbooks, and SOPs.
Provide regular incident reports to leadership, including root cause analysis and long-term corrective actions.

Requirements

Proven experience as an Incident Manager, Site Reliability Engineer (SRE), or Technical Operations Lead in cloud-native and microservices-based environments.
Strong understanding of web architecture and microservices development principles.
Deep hands-on experience with AWS Cloud Services: Compute (EC2 Lambda), Storage (S3), Databases (DynamoDB, RDS, Aurora), Messaging (Kafka, SNS, SQS), Caching (Redis, Memcache), Search (OpenSearch, Elasticsearch).
Expertise in Agile tools: Jira, Confluence, Git, Jenkins.
Strong Linux / Unix system administration skills, including troubleshooting and performance tuning.
Strong analytical skills with expertise in debugging complex distributed system issues.
Experience with monitoring and observability tools like CloudWatch, Grafana, Nagios, and Kibana.
Excellent communication and leadership skills to manage cross-functional incident response teams.
Experience in writing detailed post-incident reports and driving continuous improvement.
Strong scripting skills (Python, Bash, or similar) to automate diagnostic or remediation tasks.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Top cities

Top companies

Popular jobs

Incident Manager

Airtel Africa

City Of London

Hybrid

GBP 65,000 - 85,000