Responsibilities:
Operational Support
- Lead and coordinate level 2 support operations for mission-critical applications and infrastructure
- Provide troubleshooting and diagnostics for incidents escalated from level 1
- Ensure adherence to SLA, system availability
Application Support
- Lead and resolve application incidents escalated from Level 1; perform root cause analysis and workarounds where possible
- Lead and monitor application logs, integration points such as REST API, message queues, file-based transfer
- Lead and liaise with Level 3 to resolve complex application issues and escalate bugs or enhancement requests
- Lead and support / maintain job schedulers, interface configurations and integration points
- Lead and document known issues, resolution procedure, rollback in the knowledge base
Incident & Problem Management
- Act as incident manager for P1/P2 issues
- Coordinate resolution and communications
- Perform root cause analysis and recommend permanent fixes
- Escalate unresolved issues that required software coding to Level 3 or engineering teams
Change Management
- Perform operational impact assessment
- Part of the CAB to review and approve change
- Pre-Change Preparation such as review Change Request and Release Plan
- Supervise post-change production verification
- Documentation update and knowledge transfer
- Post change review and feedback
Patch Management
- Perform patch management readiness
- Stakeholder coordination and team coordination
- System Readiness and Post-Patch Validation
- Documentation update and knowledge transfer
- Compliance and audit readiness
Documentation and Compliance
- Operational documentation. SOPs, Incident response checklist, RCA, PIR, monitoring and alert guidebook
- Configuration & Infrastructure Documentation. System configuration baseline, application dependency maps, environment inventories such as hosts, services, accounts
- Knowledge Base Articles for level 2 enablement and faster resolution e.g. Known Errors and Fixes, Frequent How-To Guides, Script Repositories, Lessons Learned
- Knowledge Management
Configuration Management
- Perform validation and accuracy of configurations
- Maintain readiness of operational documentation
- Perform audit to confirm compliance of configurations
- CMDB asset verification
- Change-linked configuration tracking
- Ensure environment consistency between DEV - IVVQ - ISO-PROD - UAT and PROD
Testing and Verification
- Ensure operational readiness testing before production deployment rollout
- Ensure post-change verification coordination
- Perform regression and sanity test following patching or upgrades, in UAT and PROD
- Participation in user acceptance testing
Knowledge Management
- Documentation of resolution
- Knowledge Base Contribution
- Validation of knowledge
- Subject Matter Expertise Sharing
Root Cause Analysis
- Gather logs, system metrics at the time of failure
- Reproduction of issues in a controlled environment to understand the conditions under which it occurs
- Determine the scope and severity in terms of the systems affected, downtime duration and business impact
- Narrow down the possible sources of causing the failure
- Use of diagnostic tools such to analyse the application behaviour
- Correlation of events to sequence the chain of events leading up to the failure and identify the dependencies
Leadership
- Supervision and provision of guidance to Level 2 engineers for change requests and service requests
- Lead and manage day-to-day operations of the Level 2 support
- Track and report the Level 2 key performance indicators such as resolution rate, mean time to resolve and system availability
- Process and quality improvement. Document down known issues, troubleshooting steps and standard operating procedures. Propose improvements to incident handling
- Identify tools and systems to streamline Level 2 support operations
Requirements:
Education and Experience
- Bachelor Degree in Information Technology, Computer Science, Engineering, or a closely related discipline
- At least 5 years in Level 2 support for mission critical 24x7 production support, preferably in public sector
- At least 2 years in a team lead or supervisory role, coordinating tasks and mentoring junior engineers
- Proven experience in handling P1/P2 incidents, managing post-incident reviews (PIRs) and root cause analysis
- Preferably certification in Red Hat Enterprise Linux or Kubernetes
Knowledge/Skills
- Operating Systems. RHEL (90%) and Windows Server (10%)
- Networking Fundamentals
- Middleware & Infrastructure (Web Server - Nginx, App Servers - Kubernetes with containers (Docker + Spring Boot)
- Message Queues (IBM MQ, Kafka)
- Java, C#, MQTT, Golang
- Database (SQL Server, PostgreSQL)
- ITIL/ITSM Process Knowledge
- Security Awareness
- DR and HA concepts
- Leadership & Coordination
- Communication & Collaboration
- Operational Governance
Goel Navneet License No.: 02C3423 Personnel Registration No.: R1982194
Please note that your response to this advertisement and communications with us pursuant to this advertisement will constitute informed consent to the collection, use and/or disclosure of personal data by ManpowerGroup Singapore for the purpose of carrying out its business, in compliance with the relevant provisions of the Personal Data Protection Act 2012. To learn more about ManpowerGroup's Global Privacy Policy, please visit https://www.manpower.com.sg/privacy-notice.