Overview
Location: REMOTE / Toronto, Ontario
This job allows you to work remotely.
Responsibilities
- Act as the focal point for Incident, Change, and Problem Management (ITSM) for critical contact centre applications, including IVR, Genesys CTI/Engage, call recording, Workforce Management (WFM), Nuance GK/Voice ID, Pindrop, API integrations, and AWS Connect.
- Provide hands-on technical guidance for Genesys Engage and AWS Connect (Amazon Connect is highly recommended).
- Oversee Day 2 operations support for 30+ applications, ensuring application uptime and user satisfaction meet KPIs.
- Lead and actively participate in Major Incident Response Teams (MIRT) for contact centre application incidents.
- Review changes, assess cross impacts, and coordinate with the Release Management team.
- Eliminate manual health checks by driving automation and monitoring through tools such as Dynatrace, ELK/Kibana, Cyara call monitoring, and Power BI — Site Reliability Engineering (SRE) experience is preferred.
- Drive blameless post-mortems for major incidents and foster cross-team collaboration for continual improvement.
- Architect for resilience by influencing design decisions that improve system reliability and performance.
- Develop and embed non-functional requirements (NFRs) during project scoping; lead the service transition from project to production (“Build to Run”).
- Run the production environment by monitoring system health holistically and ensuring high availability.
- Build and maintain systems that manage platform infrastructure and applications.
- Proactively measure, analyze, and optimize system performance to stay ahead of customer needs and drive innovation.
- Provide operational support and engineering for large-scale, distributed software applications.
Must Have Skills
- 8+ years of experience managing contact centre applications
- Recent hands-on expertise supporting Genesys and Amazon Connect technologies.
- Proven leadership in vendor and escalation management.
- Deep understanding of the SDLC process and the ability to collaborate with development teams on runtime issues, solution sizing, and release planning.
- Strong experience with cloud and modern infrastructure tools: AWS Lambda, ECS Fargate, Azure, GitHub, OpenShift/container-based microservices, Amazon S3, Tomcat, and Apache.
- Practical experience setting up automated monitoring and dashboards using Dynatrace, Cyara, ELK, and similar tools.
- Bachelor’s degree in Computer Science, Systems Engineering, or a related technical field (e.g., Physics, Mathematics), or equivalent practical experience.
- Systematic problem-solving skills, strong ownership mindset, and excellent communication abilities.
- Demonstrated ability to lead a high-performing service delivery team.
Nice to Have Skills
- Working knowledge of Amazon Cloud and Cloud Development best practices.
- ITIL Foundation certification is required; ITIL Strategic Leadership or Managing Professional certification is a strong plus.
- Prior experience in an SRE role or driving SRE practices within production environments.