Job Search and Career Advice Platform

Enable job alerts via email!

AI Inference Reliability Tech Lead — Scale & Resilience

Cerebras

Canada

On-site

CAD 100,000 - 130,000

Full time

22 days ago

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading AI technology firm in Canada is looking for a Reliability Tech Lead to ensure their AI service's reliability. You will define strategies, lead incident management, and collaborate across multiple teams, focusing on large-scale distributed systems. Candidates should have over 7 years of experience in reliability engineering and strong programming skills. Join us to shape the future of AI technology.

Qualifications

  • 7+ years of experience in backend, infrastructure, or reliability engineering for large-scale distributed systems.
  • Strong programming skills in Python, C++, Go, or Rust.
  • Deep experience with SLO/SLI/SLA design, incident response, and postmortem culture.

Responsibilities

  • Define and drive reliability strategy, establish SLOs.
  • Design and implement reliability mechanisms across data centers.
  • Lead large-scale incident management and postmortems.
  • Collaborate with software, infrastructure, and hardware teams.

Skills

Backend programming
Reliability engineering
Incident response
Cross-functional leadership

Education

Bachelor's or master's degree in computer science or related field
Job description
A leading AI technology firm in Canada is looking for a Reliability Tech Lead to ensure their AI service's reliability. You will define strategies, lead incident management, and collaborate across multiple teams, focusing on large-scale distributed systems. Candidates should have over 7 years of experience in reliability engineering and strong programming skills. Join us to shape the future of AI technology.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.