Enable job alerts via email!

L2 Infrastructure Lead | Operational

MANPOWER STAFFING SERVICES (SINGAPORE) PTE LTD

Singapore

On-site

SGD 60,000 - 80,000

Full time

2 days ago
Be an early applicant

Job summary

A staffing services company in Singapore is seeking a Level 2 Support Engineer to lead critical application support operations. The ideal candidate will have over 5 years of experience in 24x7 support environments, strong incident management skills, and team leadership capabilities. Responsibilities include troubleshooting key incidents and ensuring operational readiness through documentation and knowledge management.

Qualifications

  • At least 5 years in Level 2 support for mission critical 24x7 production support.
  • Proven experience in handling P1/P2 incidents and managing post-incident reviews.
  • At least 2 years in a team lead or supervisory role.

Responsibilities

  • Lead and coordinate level 2 support operations for mission-critical applications.
  • Perform root cause analysis and workarounds for application incidents.
  • Act as incident manager for P1/P2 issues and coordinate resolutions.

Skills

Operating Systems (RHEL and Windows Server)
Networking Fundamentals
Middleware & Infrastructure
Message Queues
Java
C#
Database Knowledge
ITIL/ITSM Process Knowledge
Leadership & Coordination
Communication & Collaboration

Education

Bachelor Degree in Information Technology, Computer Science, Engineering

Tools

RHEL
Kubernetes
Java
SQL Server
PostgreSQL

Job description

Responsibilities:

Operational Support

  • Lead and coordinate level 2 support operations for mission-critical applications and infrastructure
  • Provide troubleshooting and diagnostics for incidents escalated from level 1
  • Ensure adherence to SLA, system availability

Application Support

  • Lead and resolve application incidents escalated from Level 1; perform root cause analysis and workarounds where possible
  • Lead and monitor application logs, integration points such as REST API, message queues, file-based transfer
  • Lead and liaise with Level 3 to resolve complex application issues and escalate bugs or enhancement requests
  • Lead and support / maintain job schedulers, interface configurations and integration points
  • Lead and document known issues, resolution procedure, rollback in the knowledge base

Incident & Problem Management

  • Act as incident manager for P1/P2 issues
  • Coordinate resolution and communications
  • Perform root cause analysis and recommend permanent fixes
  • Escalate unresolved issues that required software coding to Level 3 or engineering teams

Change Management

  • Perform operational impact assessment
  • Part of the CAB to review and approve change
  • Pre-Change Preparation such as review Change Request and Release Plan
  • Supervise post-change production verification
  • Documentation update and knowledge transfer
  • Post change review and feedback

Patch Management

  • Perform patch management readiness
  • Stakeholder coordination and team coordination
  • System Readiness and Post-Patch Validation
  • Documentation update and knowledge transfer
  • Compliance and audit readiness

Documentation and Compliance

  • Operational documentation. SOPs, Incident response checklist, RCA, PIR, monitoring and alert guidebook
  • Configuration & Infrastructure Documentation. System configuration baseline, application dependency maps, environment inventories such as hosts, services, accounts
  • Knowledge Base Articles for level 2 enablement and faster resolution e.g. Known Errors and Fixes, Frequent How-To Guides, Script Repositories, Lessons Learned
  • Knowledge Management

Configuration Management

  • Perform validation and accuracy of configurations
  • Maintain readiness of operational documentation
  • Perform audit to confirm compliance of configurations
  • CMDB asset verification
  • Change-linked configuration tracking
  • Ensure environment consistency between DEV – IVVQ – ISO-PROD – UAT and PROD

Testing and Verification

  • Ensure operational readiness testing before production deployment rollout
  • Ensure post-change verification coordination
  • Perform regression and sanity test following patching or upgrades, in UAT and PROD
  • Participation in user acceptance testing

Knowledge Management

  • Documentation of resolution
  • Knowledge Base Contribution
  • Validation of knowledge
  • Subject Matter Expertise Sharing

Root Cause Analysis

  • Gather logs, system metrics at the time of failure
  • Reproduction of issues in a controlled environment to understand the conditions under which it occurs
  • Determine the scope and severity in terms of the systems affected, downtime duration and business impact
  • Narrow down the possible sources of causing the failure
  • Use of diagnostic tools such to analyse the application behaviour
  • Correlation of events to sequence the chain of events leading up to the failure and identify the dependencies

Leadership

  • Supervision and provision of guidance to Level 2 engineers for change requests and service requests
  • Lead and manage day-to-day operations of the Level 2 support
  • Track and report the Level 2 key performance indicators such as resolution rate, mean time to resolve and system availability
  • Process and quality improvement. Document down known issues, troubleshooting steps and standard operating procedures. Propose improvements to incident handling
  • Identify tools and systems to streamline Level 2 support operations

Requirements:

Education and Experience

  • Bachelor Degree in Information Technology, Computer Science, Engineering, or a closely related discipline
  • At least 5 years in Level 2 support for mission critical 24x7 production support, preferably in public sector
  • At least 2 years in a team lead or supervisory role, coordinating tasks and mentoring junior engineers
  • Proven experience in handling P1/P2 incidents, managing post-incident reviews (PIRs) and root cause analysis
  • Preferably certification in Red Hat Enterprise Linux or Kubernetes

Knowledge/Skills

  • Operating Systems. RHEL (90%) and Windows Server (10%)
  • Networking Fundamentals
  • Middleware & Infrastructure (Web Server – Nginx, App Servers – Kubernetes with containers (Docker + Spring Boot)
  • Message Queues (IBM MQ, Kafka)
  • Java, C#, MQTT, Golang
  • Database (SQL Server, PostgreSQL)
  • ITIL/ITSM Process Knowledge
  • Security Awareness
  • DR and HA concepts
  • Leadership & Coordination
  • Communication & Collaboration
  • Operational Governance
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.