Enable job alerts via email!

Clear-Azure Site Reliability Engineer

Cynet systems Inc

Toronto

On-site

CAD 80,000 - 100,000

Full time

Today

Be an early applicant

Job summary

A cloud infrastructure management company based in Toronto is seeking a skilled professional to manage and optimize monitoring systems. The role involves responding to incidents, maintaining infrastructure, and collaborating with teams to ensure reliability. Ideal candidates will have expertise in cloud platforms, particularly Microsoft Azure, and strong scripting skills. This position offers a dynamic work environment and opportunities for growth.

Qualifications

Experience with cloud platforms, especially Microsoft Azure.
Strong understanding of monitoring tools like Dynatrace and Grafana.
Proficiency in scripting languages such as Python and Shell.
Knowledge of container services like Kubernetes and Docker.

Responsibilities

Implement and maintain monitoring systems to proactively identify potential issues.
Respond to incidents and outages to minimize downtime and restore service.
Automate repetitive tasks to improve efficiency and reduce manual effort.
Manage and maintain the underlying infrastructure including servers and cloud resources.
Plan for future capacity needs to handle anticipated workloads.
Develop and maintain processes for deploying software updates and releases.
Collaborate with various teams to ensure system reliability and availability.
Maintain clear documentation of systems, processes, and procedures.

Skills

Cloud Platform Microsoft Azure

Excellent knowledge of AKS

Monitoring tools Dynatrace Client Grafana

Operating System Windows Linux

Scripting Shell Scripting Python Power Shell

Database MySQL Oracle SQL

Container Services Kubernetes Docker Helm

Understanding of Camunda

Overview

Job Description: Monitoring and Alerting. Implement and maintain monitoring systems to proactively identify potential issues and alert engineers to problems before they impact users. Incident Response. Respond to incidents and outages diagnose problems and implement solutions to minimize downtime and restore service. Automation. Automate repetitive tasks and processes to improve efficiency and reduce manual effort. Performance Optimization. Identify and address performance bottlenecks to ensure systems run efficiently and effectively. Infrastructure Management. Manage and maintain the underlying infrastructure including servers networks and cloud resources. Capacity Planning. Plan for future capacity needs to ensure systems can handle anticipated workloads. Release Engineering. Develop and maintain processes for deploying software updates and releases. Collaboration. Work closely with developers operations teams and other stakeholders to ensure system reliability and availability. Documentation. Maintain clear and concise documentation of systems processes and procedures. Continuous Improvement. Identify areas for improvement and implement changes to enhance system reliability and performance.

Responsibilities

Monitoring and Alerting.
Implement and maintain monitoring systems to proactively identify potential issues and alert engineers to problems before they impact users.
Incident Response.
Respond to incidents and outages diagnose problems and implement solutions to minimize downtime and restore service.
Automation.
Automate repetitive tasks and processes to improve efficiency and reduce manual effort
Performance Optimization.
Identify and address performance bottlenecks to ensure systems run efficiently and effectively
Infrastructure Management.
Manage and maintain the underlying infrastructure including servers networks and cloud resources
Capacity Planning.
Plan for future capacity needs to ensure systems can handle anticipated workloads
Release Engineering.
Develop and maintain processes for deploying software updates and releases
Collaboration.
Work closely with developers operations teams and other stakeholders to ensure system reliability and availability.
Documentation.
Maintain clear and concise documentation of systems processes and procedures.
Continuous Improvement.
Identify areas for improvement and implement changes to enhance system reliability and performance

Skills and Qualifications

Cloud Platform Microsoft Azure.
Excellent knowledge of AKS.
Monitoring tools Dynatrace Client Grafana.
Operating System Windows Linux.
Scripting Shell Scripting Python Power Shell.
Database MySQL Oracle SQL database management.
Container Services Kubernetes Docker Helm.
Understanding of Camunda is preferable.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Clear-Azure Site Reliability Engineer

Cynet systems Inc

Toronto

On-site

CAD 80,000 - 100,000

Full time

Job summary

Qualifications

Responsibilities

Skills

Company

Services

Free resources

Support

Clear-Azure Site Reliability Engineer

Cynet systems Inc

Toronto

On-site

CAD 80,000 - 100,000

Full time

Job summary

Qualifications

Responsibilities

Skills

Follow us

Company

Services

Free resources

Support