Enable job alerts via email!

Site Reliability Engineer / Platform Operations Engineer

Targeted Talent

Canada

Remote

CAD 80,000 - 100,000

Full time

7 days ago

Be an early applicant

Job summary

A global enterprise company is seeking a Site Reliability Engineer to lead development projects and ensure robust operational response. This permanent role starts remotely with future relocation to Calgary or Winnipeg. Ideal candidates will have strong AWS experience, Java development skills, and a history of managing incidents in production environments. Competitive salary and great perks offered.

Benefits

Competitive salary

Great perks

Qualifications

Proven troubleshooting, problem-solving, and investigative skills.
Experience with AWS or other cloud providers.
Strong development experience in Java.
Experience managing major incidents on production platforms.

Responsibilities

Own development projects and provide technical guidance.
Design and implement Wargames to test operational responses.
Serve as Technical and Management Escalation point during incidents.
Troubleshoot and mitigate production issues.

Skills

Troubleshooting skills

AWS experience

Java development

Incident management

Distributed web applications

Automation of operational tasks

Data structures understanding

Mentoring skills

Risk assessment

Tools

Ansible

Terraform

Python

ELK

Grafana

Tracing Tools

Overview

We are looking for an experienced Site Reliability Engineer or Platform Operations Engineer for our client. This is a permanent position that is remote to start with later relocation to Calgary or Winnipeg. Our client is a global enterprise company with a product that you've likely used.

You Will

Own development projects, providing technical guidance and delivering against the Platform & Service Operations Engineering roadmap.
Designing and Implementing Wargames to test our operational response and identify areas of weakness in our platforms.
Technical and Management Escalation point for Service Operations Centre (SOC) engineers and during major incidents.
Troubleshooting, reproducing and mitigating issues in our production environments
Mentoring other team members.
Operate global AWS Platforms at scale

You Have

Evidence of strong troubleshooting, problem-solving and investigative skills
Experience of AWS or other cloud providers
Experience developing in Java
Major incident management on experience operating production platforms at scale
Experience working with distributed web applications
Experience automating operational tasks / processes using other languages
Understanding of relational and/or NoSQL data structures
Experience mentoring/influencing peers
Identifying improvements, highlighting risks vs benefits, and translating them into technical requirements

Bonus

Worked with Ansible, Terraform, Python
Experience working with Serverless / Containers
Experience of ELK &/Or Graphite/Prometheus / Grafana
Used Tracing Tools in production before
Experience in Chaos Engineering / Failure Injection Testing
Experience of working in an Agile Environment
Experience working in a similar site reliability role

This role offers great perks and a competitive salary, please apply to the job posting if it matches your career path!

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Site Reliability Engineer / Platform Operations Engineer

Targeted Talent

Canada

Remote

CAD 80,000 - 100,000

Full time

Job summary

Benefits

Qualifications

Responsibilities

Skills

Tools

Company

Services

Free resources

Support

Site Reliability Engineer / Platform Operations Engineer

Targeted Talent

Canada

Remote

CAD 80,000 - 100,000

Full time

Job summary

Benefits

Qualifications

Responsibilities

Skills

Tools

Follow us

Company

Services

Free resources

Support