Key Responsibilities
- Provide L2/L3 production support for Java/web-based applications, primarily in banking environments.
- Manage 24/7 application support, including high-severity and high-impact incidents, with a focus on fast recovery and minimal disruption.
- Monitor, troubleshoot, and resolve application and infrastructure issues promptly.
- Perform root cause analysis and post-resolution follow-ups for incidents.
- Participate in disaster recovery drills and major incident investigations.
- Generate and maintain reports based on ticket metrics and application performance.
- Coordinate and execute infrastructure maintenance activities including patching, upgrades, and deployments.
- Manage and maintain Docker containers and batch jobs (e.g., TWS, Cronjobs).
- Conduct regular health checks and performance monitoring using tools and scripts.
- Maintain documentation for technical procedures, incident handling, and change management.
- Support CI/CD pipelines and DevOps practices for application releases and change requests.
- Ensure SIT and UAT approval processes are completed, including UAT Business Unit sign-off.
- Collaborate with developers, project teams, vendors, and infrastructure teams to ensure seamless operations.
- Drive effective communication between business and technology regarding production service reliability and performance.
- Champion production resilience and availability, focusing on superior client experience.
- Drive the implementation of Site Reliability Engineering (SRE) and Chaos Engineering for strategic systems.
- Improve system reliability and availability by gathering data and designing for performance.
- Drive continuous improvements in processes and systems using SRE methodologies.
- Provide expert advice and training on technology solutions and advanced reliability techniques.
Required Skills & Experience
- Minimum 10 years of experience in production support, system administration, or application support roles.
- Strong knowledge of Linux/AIX/UNIX, Windows Servers, and cloud platforms (AWS, Azure).
- Proficient in SQL, MySQL, MariaDB, and database batch scripting.
- Experience with WebSphere, WebLogic, Apache Tomcat, and VMware.
- Familiarity with monitoring tools, log analysis, and performance tuning.
- Hands-on experience with DevOps tools and CI/CD pipelines.
- Strong communication and problem-solving skills.
- Ability to work in high-pressure environments and manage multiple priorities.
Education & Certifications
- Bachelor’s Degree in Science (Computer Science) or related field.