Enable job alerts via email!
A cloud infrastructure management company based in Toronto is seeking a skilled professional to manage and optimize monitoring systems. The role involves responding to incidents, maintaining infrastructure, and collaborating with teams to ensure reliability. Ideal candidates will have expertise in cloud platforms, particularly Microsoft Azure, and strong scripting skills. This position offers a dynamic work environment and opportunities for growth.
Job Description: Monitoring and Alerting. Implement and maintain monitoring systems to proactively identify potential issues and alert engineers to problems before they impact users. Incident Response. Respond to incidents and outages diagnose problems and implement solutions to minimize downtime and restore service. Automation. Automate repetitive tasks and processes to improve efficiency and reduce manual effort. Performance Optimization. Identify and address performance bottlenecks to ensure systems run efficiently and effectively. Infrastructure Management. Manage and maintain the underlying infrastructure including servers networks and cloud resources. Capacity Planning. Plan for future capacity needs to ensure systems can handle anticipated workloads. Release Engineering. Develop and maintain processes for deploying software updates and releases. Collaboration. Work closely with developers operations teams and other stakeholders to ensure system reliability and availability. Documentation. Maintain clear and concise documentation of systems processes and procedures. Continuous Improvement. Identify areas for improvement and implement changes to enhance system reliability and performance.