Enable job alerts via email!

AIOPs Engineer

Cynet systems Inc

Toronto

On-site

CAD 80,000 - 100,000

Full time

Today
Be an early applicant

Job summary

A cloud services provider in Toronto is looking for a Cloud Operations Engineer to architect and implement RunOps frameworks and enhance BAU support models for cloud infrastructure. The ideal candidate will possess strong experience in AWS, Azure, GCP, and Infrastructure as Code tools like Terraform and Ansible. This role focuses on driving operational excellence through automation and AIOps adoption. The candidate should also have experience with scripting languages and observability tools.

Qualifications

  • Strong experience in cloud operations (AWS, Azure, GCP) and hybrid infrastructure management.
  • Expertise in Infrastructure as Code (Terraform, Ansible) and CI/CD pipelines.
  • Hands-on experience with observability and incident management tools.
  • Proficiency in scripting and automation (Python, Bash, PowerShell).
  • Familiarity with DevSecOps practices.

Responsibilities

  • Architect and implement RunOps frameworks for Day 2 operations.
  • Lead development and enhancement of L2/L3 support models.
  • Design and maintain automation modules using Infrastructure as Code.
  • Collaborate with SRE and application teams.
  • Drive adoption of AIOps and self-healing mechanisms.
  • Define operational SLAs, SLOs, and KPIs.

Skills

Cloud operations
Infrastructure as Code
CI/CD
observability
incident management
scripting
DevSecOps
GitHub Copilot
ITIL
operational governance

Education

Bachelor’s degree in Computer Science
Job description
Job Description
  • The Cloud Operations Engineer will architect and implement RunOps frameworks to support Day 2 operations, including monitoring, incident management, and automated remediation.
  • This role will lead the development and enhancement of BAU support models for cloud infrastructure and PaaS services, design automation modules using Infrastructure as Code (IaC), and collaborate with SRE and application teams to ensure operational readiness.
  • The role drives the adoption of AIOps, self-healing mechanisms, and continuous improvement initiatives to improve system reliability.
Responsibilities
  • Architect and implement RunOps frameworks for Day 2 operations, including monitoring, incident management, and automated remediation.
  • Lead the development and enhancement of L2/L3 support models for cloud infrastructure (Windows/Linux) and PaaS services.
  • Design and maintain automation modules for platform features (Day 1) and resiliency features (Day 2) using Infrastructure as Code (IaC).
  • Collaborate with SRE and application teams to ensure operational readiness and observability integration across environments.
  • Drive adoption of AIOps and self-healing mechanisms to reduce MTTR and improve system reliability.
  • Define and enforce operational SLAs, SLOs, and KPIs to measure and improve service performance.
  • Provide technical leadership in root cause analysis, post-incident reviews, and continuous improvement initiatives.
Requirement/Must Have
  • Strong experience in cloud operations (AWS, Azure, GCP) and hybrid infrastructure management.
  • Expertise in Infrastructure as Code (Terraform, Ansible, Client, etc.) and CI/CD pipelines.
  • Hands-on experience with observability and incident management tools (e.g., New Relic, PagerDuty, Client).
  • Proficiency in scripting and automation (Python, Bash, PowerShell).
  • Familiarity with DevSecOps practices and integration of security into operational workflows.
  • Working knowledge of IDP platforms and developer enablement tools.
  • Practical experience using GitHub Copilot for automation and code generation.
  • Strong understanding of ITIL processes, service management, and operational governance.
Preferred Qualifications
  • Certifications in cloud platforms (AWS/Azure/GCP Architect or DevOps Engineer).
  • Experience with AIOps platforms and event correlation engines.
  • Exposure to compliance and regulatory requirements in financial or insurance sectors.
Experience
  • Proven hands-on experience managing cloud operations, automation, and observability frameworks.
  • Experience in driving operational excellence and incident response in enterprise environments.
Skills
  • Cloud operations, IaC, CI/CD, observability, incident management, scripting, DevSecOps, GitHub Copilot, ITIL, operational governance.
Qualification And Education
  • Bachelor’s degree in Computer Science, Information Technology, or a related field preferred.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.