Enable job alerts via email!

SRE Coach

Randstad Canada

Mississauga

Hybrid

CAD 80,000 - 120,000

Full time

5 days ago
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An innovative firm is seeking an experienced SRE Coach to lead teams in adopting Site Reliability Engineering best practices. This role offers the chance to collaborate with engineering leads, mentor teams, and drive a culture of reliability and resilience. You will work with cutting-edge cloud technologies and observability tools while influencing the broader SRE transformation across the enterprise. If you are passionate about coaching and improving systems, this is a fantastic opportunity to make a significant impact in a hybrid work environment.

Benefits

Work with leading employer
Flexible work environment
Opportunity for remote work

Qualifications

  • 5+ years in Site Reliability Engineering with expertise in distributed systems.
  • Strong experience in cloud infrastructure and observability tooling.

Responsibilities

  • Coach teams on SRE best practices and build a culture of resilience.
  • Collaborate with engineering leads to customize SRE patterns and automation.

Skills

Site Reliability Engineering
Cloud Infrastructure
Linux/Windows Environments
Infrastructure as Code
Observability Tooling
CI/CD Pipelines
Coaching and Mentoring
Agile Principles
Bilingual (French and English)

Tools

AWS
Azure
GCP
Terraform
Prometheus
Grafana
Datadog
GitLab CI
Jenkins

Job description

Are you a SRE Coach looking for a new opportunity?
Are you looking for a new contract opportunity?

We are pleased to offer you a new contract opportunity for you to consider: SRE Coach

- Start: ASAP
- Estimated length: 12 months
- Location: Mississauga
- Hybrid role - 2/3 days a week in local area office, could consider fully remote for the right candidate.

Advantages
You will have an opportunity to work with a leading employer in the local market.

Responsibilities

  1. Coach teams to adopt SRE best practices and build a culture of resilience, observability, and reliability.
  2. Guide teams in defining and using SLIs, SLOs, and error budgets to drive engineering priorities.
  3. Collaborate with platform and engineering leads to customize Golden SRE Patterns, including monitoring, alerting, runbooks, and automation frameworks.
  4. Provide coaching on incident response, blameless postmortems, and improving Mean Time to Recovery (MTTR).
  5. Mentor engineering teams and product owners on reliability-focused delivery, with an emphasis on observability, failure mode analysis, and shift-left reliability practices.
  6. Work with teams to identify and reduce manual toil by advocating for automation-first solutions.
  7. Partner with security, compliance, and infrastructure stakeholders to embed secure, compliant reliability into CI/CD and runtime environments.
  8. Create tailored training content and workshops to support SRE maturity and team self-sufficiency.
  9. Participate in broader SRE transformation initiatives across the enterprise.

Qualifications
  1. 5+ years of experience in a Site Reliability Engineering or platform engineering role, with deep understanding of distributed systems and service operations.
  2. Strong experience in cloud infrastructure (AWS, Azure, GCP, OpenStack, VMware) and Linux/Windows environments.
  3. Experience with Infrastructure as Code (Terraform, CloudFormation, Ansible) and automated configuration management.
  4. Hands-on knowledge of observability tooling (Prometheus, Grafana, Datadog, ELK, etc.), incident response systems (PagerDuty, OpsGenie), and reliability metrics.
  5. Strong understanding of CI/CD pipelines, automation tooling (GitLab CI, Jenkins, ArgoCD, etc.), and integrating security and compliance in delivery workflows.
  6. Proven ability to coach teams through SRE principles, like reducing toil, error budget management, capacity planning, and chaos engineering.
  7. Experience driving blameless culture, technical postmortems, and facilitating continuous learning from incidents.
  8. Strong communication and mentoring skills—able to work across teams, from developers to leadership, to drive a shared reliability mindset.
  9. Understanding of Agile principles and how to incorporate SRE within Agile/DevOps/SAFe delivery models.
  10. A problem-solver who thinks in systems, seeks continuous improvement, and embraces change.
  11. Bilingual French and English (written and verbal) is an asset.

Summary
Do you have this experience? If you answer YES, then please apply IMMEDIATELY to discuss your experience and interest in this opportunity!
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.