We are looking for an experienced Lead Application Support Engineer to join our client’s dynamic Platform and Infrastructure Team. In this fast-paced, innovative environment, you’ll play a pivotal role in ensuring the stability, reliability, and performance of mission-critical customer-facing systems, keeping them operational 24/7.
As a Lead Application Support Engineer, you will be responsible for ensuring the seamless operation of critical customer-facing applications and systems.
Your Role
- Proactive Monitoring and Incident Management: Oversee application performance by actively monitoring systems, promptly responding to alerts, and troubleshooting complex technical issues to ensure high service levels.
- Team Leadership and Mentorship: Lead and inspire a team of talented Application Support Engineers. Provide guidance, mentorship, and line management to foster a culture of collaboration, continuous improvement, and operational excellence.
- Process Optimization: Develop, implement, and refine processes to enhance efficiency, minimize incident impact, and ensure smooth operations.
- Documentation and Knowledge Sharing: Maintain and improve comprehensive documentation to promote knowledge sharing and operational readiness.
- Problem Prevention: Identify and resolve potential issues in production environments before they impact users.
- Incident Reporting: Create and present detailed incident reports, clearly outlining root causes, resolutions, and preventive measures.
- Cross-Team Collaboration: Work closely with development and operations teams to prepare for upcoming releases, ensuring smooth rollouts and system stability.
- Scheduling and On-Call Management: Manage on-call rotation schedules and shift patterns to guarantee 24/7 system coverage. Participate in the on-call rotation as needed.
What we look for in you
- Leadership & Management: Exceptional communication skills—verbal, written, and interpersonal—to effectively engage with cross-functional teams, stakeholders, and clients in a fast-paced environment.
- Application Expertise: Proven experience supporting large-scale, high-availability applications and systems, especially in gaming platforms and related technologies.
- Advanced Troubleshooting: Deep expertise in debugging and resolving critical production issues, including analyzing logs, audit trails, and performance metrics.
- Monitoring Mastery: Hands-on experience with industry-standard monitoring tools like Grafana, Kibana, and the ELK stack to ensure optimal platform performance and user experience.
- Alert Management: Skilled in configuring, maintaining, and fine-tuning alerts to proactively detect and mitigate system issues, minimizing downtime.
- Automation Savvy: Experience with scripting and automation (e.g., Bash, Python, Shell) to remediate production issues, streamline operations, and improve system efficiency.
Technical Expertise
- Database Management: Advanced proficiency in Oracle SQL, ensuring reliability and performance.
- Scripting and Automation: Expertise in Bash, Shell scripting, and task automation to support smooth application operations.
- Containerization: Proficient with Docker to enhance scalability and deployment efficiency.
- Monitoring and Logging: Comprehensive knowledge of monitoring and logging tools (Kibana, Elastic Stack, Grafana) for real-time insights into platform performance.
- CI/CD Practices: Familiarity with Jenkins and other CI/CD tools for deploying updates with minimal disruption.
- Cloud Platforms: Hands-on experience with AWS, Azure, GCP to manage scalable, secure infrastructure.
- DevOps and SRE Collaboration: Partner with DevOps and SRE teams to ensure operational excellence and platform reliability.
- API Diagnostics: Proficient in API troubleshooting using tools like HTTP requests to ensure seamless integrations.
What You Need To Do Next
If this role interests you and you believe you have the required experience, please contact me directly.
Email : [Your contact information]
Follow me on LinkedIn for the latest jobs and career opportunities.