Enable job alerts via email!

Clear-Azure Site Reliability Engineer

Cynet systems Inc

Toronto

On-site

CAD 80,000 - 100,000

Full time

Today
Be an early applicant

Job summary

A cloud infrastructure management company based in Toronto is seeking a skilled professional to manage and optimize monitoring systems. The role involves responding to incidents, maintaining infrastructure, and collaborating with teams to ensure reliability. Ideal candidates will have expertise in cloud platforms, particularly Microsoft Azure, and strong scripting skills. This position offers a dynamic work environment and opportunities for growth.

Qualifications

  • Experience with cloud platforms, especially Microsoft Azure.
  • Strong understanding of monitoring tools like Dynatrace and Grafana.
  • Proficiency in scripting languages such as Python and Shell.
  • Knowledge of container services like Kubernetes and Docker.

Responsibilities

  • Implement and maintain monitoring systems to proactively identify potential issues.
  • Respond to incidents and outages to minimize downtime and restore service.
  • Automate repetitive tasks to improve efficiency and reduce manual effort.
  • Manage and maintain the underlying infrastructure including servers and cloud resources.
  • Plan for future capacity needs to handle anticipated workloads.
  • Develop and maintain processes for deploying software updates and releases.
  • Collaborate with various teams to ensure system reliability and availability.
  • Maintain clear documentation of systems, processes, and procedures.

Skills

Cloud Platform Microsoft Azure
Excellent knowledge of AKS
Monitoring tools Dynatrace Client Grafana
Operating System Windows Linux
Scripting Shell Scripting Python Power Shell
Database MySQL Oracle SQL
Container Services Kubernetes Docker Helm
Understanding of Camunda
Job description
Overview

Job Description: Monitoring and Alerting. Implement and maintain monitoring systems to proactively identify potential issues and alert engineers to problems before they impact users. Incident Response. Respond to incidents and outages diagnose problems and implement solutions to minimize downtime and restore service. Automation. Automate repetitive tasks and processes to improve efficiency and reduce manual effort. Performance Optimization. Identify and address performance bottlenecks to ensure systems run efficiently and effectively. Infrastructure Management. Manage and maintain the underlying infrastructure including servers networks and cloud resources. Capacity Planning. Plan for future capacity needs to ensure systems can handle anticipated workloads. Release Engineering. Develop and maintain processes for deploying software updates and releases. Collaboration. Work closely with developers operations teams and other stakeholders to ensure system reliability and availability. Documentation. Maintain clear and concise documentation of systems processes and procedures. Continuous Improvement. Identify areas for improvement and implement changes to enhance system reliability and performance.

Responsibilities
  • Monitoring and Alerting.
  • Implement and maintain monitoring systems to proactively identify potential issues and alert engineers to problems before they impact users.
  • Incident Response.
  • Respond to incidents and outages diagnose problems and implement solutions to minimize downtime and restore service.
  • Automation.
  • Automate repetitive tasks and processes to improve efficiency and reduce manual effort
  • Performance Optimization.
  • Identify and address performance bottlenecks to ensure systems run efficiently and effectively
  • Infrastructure Management.
  • Manage and maintain the underlying infrastructure including servers networks and cloud resources
  • Capacity Planning.
  • Plan for future capacity needs to ensure systems can handle anticipated workloads
  • Release Engineering.
  • Develop and maintain processes for deploying software updates and releases
  • Collaboration.
  • Work closely with developers operations teams and other stakeholders to ensure system reliability and availability.
  • Documentation.
  • Maintain clear and concise documentation of systems processes and procedures.
  • Continuous Improvement.
  • Identify areas for improvement and implement changes to enhance system reliability and performance
Skills and Qualifications
  • Cloud Platform Microsoft Azure.
  • Excellent knowledge of AKS.
  • Monitoring tools Dynatrace Client Grafana.
  • Operating System Windows Linux.
  • Scripting Shell Scripting Python Power Shell.
  • Database MySQL Oracle SQL database management.
  • Container Services Kubernetes Docker Helm.
  • Understanding of Camunda is preferable.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.