Key Responsibilities
1. Team Leadership & People Management
- Lead and supervise a team and oversee 24/7 operational coverage of manpower
- Provide mentorship, coaching, and skill development for junior engineers.
- Conduct performance reviews, identify training gaps, and drive continuous improvement across the team.
- Act as the point-of-escalation for all operational matters.
2. Data Centre Operations Management
- Ensure smooth day-to-day operation of all systems, networks, security tools, and DC facilities.
- Oversee all daily operations across data centre infrastructure, security, systems, and network domains to ensure efficiency, stability, and continuous service availability.
- Ensure compliance with Data Centre SOPs, operational policies, and security guidelines.
- Coordinate equipment movement, installation, decommissioning, and preventive maintenance activities.
- Review shift reports, incident logs, and ensure proper documentation for audits.
- Drive readiness, preventive maintenance, audit compliance, and operational excellence initiatives.
3. Incident & Problem Management
- Act as the senior escalation point for critical incidents across systems, networks, and security technologies.
- Provide expert guidance on infrastructure issues including Windows/Linux servers, virtualization, storage, security appliances, and network technologies.
- Review RCA reports, lead problem management, and drive long-term remediation plans.
- Oversee complex change requests, service requests, patching cycles, and integration activities.
- Ensure alignment with ITIL processes and industry best practices.
- Oversee triaging, root cause analysis (RCA), and ensure timely closure of incidents and service requests.
- Ensure incidents, alarms and alerts are properly logged, categorised, prioritised, and tracked according to SLA.
- Liaise with customers, vendors, and internal stakeholders for critical incidents and troubleshooting.
4. Systems & Network Operations Oversight
- Server, OS, storage systems monitoring and maintenance.
- Network operations including switches, routers, firewalls, VPN and load balancers.
- Ensure routine patching, backup operation, and system health checks.
- Guide the team on best practices for authentication, authorization, encryption and configuration management.
5. Vendor & Stakeholder Management
- Coordinate with external vendors for maintenance, replacement, and enhancement activities.
- Ensure timely follow-up on open tickets, service disruptions and preventive maintenance.
- Communicate operational updates, risks, and issues to management and customers.
6. Compliance, Documentation & Reporting
- Ensure operational documentation (SOP, checklist, incident report, RCA, inventory, access logs) are updated and accurate.
- Drive audit readiness for ISO, IT security audits, and internal governance requirements.
- Prepare periodic operational reports and dashboards for management.
Job Requirements
Education & Experience
- Diploma/Degree in Computer Science, IT, Engineering or related fields.
- Minimum 10 - 15 years of hands‑on experience in IT infrastructure / data centre operations.
- Minimum 5 years experience leading a technical team, operation centre, or shift‑based environment.
Technical Skills
- IT Infrastructure (Windows / Linux servers, virtualization, storage)
- Network technologies (L2/L3 concepts, routing, switching, firewalls)
- Azure Stack Hub / hybrid cloud environments
- Storage: S2D, SAN/NAS, Unity/VNX
- Windows HCI
- Kubernetes / SDN networking knowledge
- Security tools/products (e.g., BeyondTrust, SEPM, RSA, Palo Alto, Checkpoint, Fortigate, Safenet)
- Data Centre operations (monitoring tools, backup operations, tape management, hardware handling)
- Backup operations / Tape library / DR procedures
- Incident management and ITIL framework
- Knowledge of authentication, encryption, access management concepts
Key Competencies
- Strong leadership, communication, and stakeholder management skills.
- Proven ability to manage crisis situations, critical escalations, and high‑severity incidents.
- Strong analytical and problem‑solving mindset.
- Ability to develop team members, build processes, and drive operational excellence.
- Able to support 24×7 operations when required (escalation or major incidents).