Enable job alerts via email!

Remote Site Reliability Engineer II - Observability & Uptime

OpenTable

Toronto

Hybrid

CAD 110,000 - 130,000

Full time

8 days ago

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A prominent online reservation platform located in Toronto is seeking a Site Reliability Engineer to take on responsibilities for maintaining observability systems, ensuring the uptime of critical logging and metrics systems. The role starts remote and will transition to a hybrid model in the future. Candidates should have at least 3 years of experience and a strong background in AWS and related tools. The position includes generous vacation and mental health support.

Benefits

Generous paid vacation
Work from (almost) anywhere for up to 20 days/year
Mental health support
Paid parental leave
Development dollars for career growth
Travel discounts
Private health and dental insurance

Qualifications

  • Good verbal communication and written documentation skills.
  • Proven experience in a SRE role or related.
  • Excitement to learn new technologies and stacks.
  • Solid understanding of DNS, TCP/IP, Linux Server Administration, and shell scripting.
  • Experience supporting services across VMs, Docker Containers, and Kubernetes.
  • Strong practical experience with AWS services.
  • Production-level experience implementing and maintaining GitOps practices.
  • Experience with automation tools like Terraform is desirable.
  • 3+ years experience in observability.

Responsibilities

  • Design, implement, and maintain observability systems.
  • Ensure uptime of logging and metric systems across regions.
  • Help engineering teams maximize value from metrics and logs.
  • Troubleshoot across different systems.
  • Collaborate with global teams via Zoom and Slack.
  • Define priorities and set delivery goals with leadership.
  • Participate in a 12-hour on-call rotation.
  • Automate processes to reduce emergency calls.

Skills

Verbal communication skills
Written documentation skills
Experience in SRE role
Linux Server Administration
Shell scripting
Docker Containers
Kubernetes
AWS services
GitOps practices experience
Terraform
Production-level observability experience

Tools

AWS
Loki
Prometheus
Mimir
Tempo
OpenTelemetry Collector
Job description
A prominent online reservation platform located in Toronto is seeking a Site Reliability Engineer to take on responsibilities for maintaining observability systems, ensuring the uptime of critical logging and metrics systems. The role starts remote and will transition to a hybrid model in the future. Candidates should have at least 3 years of experience and a strong background in AWS and related tools. The position includes generous vacation and mental health support.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.