Enable job alerts via email!

Cloud Site Reliability Engineer - AZURE (32286)

ZipRecruiter

Toronto

On-site

CAD 120,000 - 150,000

Full time

Today
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading company is seeking a Cloud Site Reliability Engineer to enhance cloud infrastructure and applications. This role involves strategic initiatives, mentoring junior engineers, and leading projects to ensure operational excellence in a dynamic environment. The ideal candidate will possess extensive experience in cloud technologies and incident management, driving innovative solutions to meet organizational goals.

Qualifications

  • 12+ years of experience in cloud support or operations.
  • Expertise in Microsoft Azure or equivalent cloud platforms.

Responsibilities

  • Lead and resolve complex technical issues involving Azure cloud environment.
  • Conduct Root Cause Analysis (RCA) for high-severity incidents.
  • Architect and optimize cloud infrastructure for performance and scalability.

Skills

Leadership
Incident Management
Cloud Technologies
Collaboration
Automation

Education

Bachelor's degree in Computer Science
Bachelor's degree in Engineering

Tools

Azure
AKS
OpenShift
Azure DevOps
Azure Insights
Grafana
PowerShell
Python
ServiceNow

Job description

Job Description

Lead strategic initiatives to ensure the reliability, scalability, and performance of our cloud infrastructure and applications. This advanced role requires expertise in cloud technologies, strategic planning, and incident management to drive innovative solutions and operational excellence.

As a Cloud Site Reliability Engineer (CSRE), you will influence cloud reliability strategies, mentor junior engineers, and lead impactful projects. This position reports directly to the VP of Cloud Services and requires a proactive, collaborative approach to meet operational and strategic goals.

Responsibilities

  • Lead and resolve complex technical issues involving our client's products and Azure cloud environment.
  • Design and implement operational enhancements to improve resiliency and system reliability.
  • Conduct Root Cause Analysis (RCA) for high-severity incidents and lead initiatives to prevent recurrence.
  • Represent the organization in external client escalation calls, providing guidance and solutions.
  • Architect and optimize cloud infrastructure for performance, scalability, and cost-efficiency.
  • Manage and scale container orchestration platforms such as AKS and OpenShift.
  • Implement advanced monitoring solutions and integrate predictive analytics for proactive issue resolution.
  • Develop automation strategies to streamline operations and incident responses.
  • Maintain documentation of cloud architectures, processes, and incident strategies.
  • Mentor and coach junior engineers, fostering continuous learning and innovation.
  • Drive strategic initiatives through collaboration with cross-functional teams.

Must Have

  • Bachelor's degree in Computer Science, Engineering, or related field.
  • 12+ years of experience in cloud support or operations.
  • Expertise in Microsoft Azure or equivalent cloud platforms.
  • Experience with container orchestration systems like AKS or OpenShift.
  • Leadership in managing automated deployment pipelines, including Azure DevOps.
  • Proficiency with enterprise monitoring platforms (e.g., Azure Insights, Grafana) and predictive analytics tools.
  • Advanced scripting skills with PowerShell, Python, or similar.
  • Experience in incident management and defining SLAs for global environments.
  • Knowledge of database management, especially PostgreSQL.

Nice to Have

  • Advanced certifications in cloud platforms (e.g., Azure Solutions Architect Expert).
  • Experience with ITSM tools like ServiceNow.
  • Understanding of security and compliance in cloud environments.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Site Reliability Engineer (SRE) - Platform Infrastructure team (100% Remote - Canada)

Hopper

Toronto

Remote

CAD 100.000 - 130.000

Yesterday
Be an early applicant

Senior Site Reliability Engineer (SRE), Private Cloud Operations

RBC

Toronto

On-site

CAD 100.000 - 130.000

Today
Be an early applicant

Observability Engineer - Platform Reliability (Junior to Mid-Level)

Releady

Toronto

Remote

CAD 125.000 - 150.000

Yesterday
Be an early applicant

Site Reliability Engineer 1

Ursus

Toronto

Remote

CAD 125.000 - 150.000

9 days ago

Senior Principal Platform Architect

ServiceNow

Toronto

Remote

CAD 130.000 - 160.000

2 days ago
Be an early applicant

Site Reliability Engineer

Wave Mobile Money

Ontario

Remote

USD 100.000 - 153.000

Yesterday
Be an early applicant

Senior Site Reliability Engineer II

Tbwa Chiat / Day Inc

Ontario

Remote

CAD 100.000 - 130.000

Yesterday
Be an early applicant

Observability Engineer - Platform Reliability (Junior to Mid-Level)

Releady

Calgary

Remote

CAD 125.000 - 150.000

14 days ago

Site Reliability Engineer | North America | Canada | Europe | Fully Remote

Escape Velocity Entertainment

Remote

CAD 100.000 - 130.000

Today
Be an early applicant