Site Reliability Engineer / Platform Operations Engineer
Targeted Talent
Vancouver, Winnipeg, Montréal
À distance
CAD 80 000 - 110 000
Plein temps
Il y a 6 jours
Soyez parmi les premiers à postuler
Résumé du poste
A leading global tech firm is seeking an experienced Site Reliability Engineer to lead projects and enhance operational responses. You will design Wargames, troubleshoot production issues, and mentor team members while managing AWS platforms. Ideal candidates will have strong troubleshooting, AWS, and Java expertise. This role offers competitive salary and great perks, initially remote with relocation to Calgary or Winnipeg.
Prestations
Competitive salary
Great perks
Responsabilités
Own development projects and deliver against the engineering roadmap.
Design and implement Wargames to test operational responses.
Act as technical escalation for SOC engineers during major incidents.
Troubleshoot and mitigate issues in production environments.
Mentor team members.
Operate global AWS Platforms at scale.
Connaissances
Troubleshooting
Problem-solving
Investigative skills
Experience of AWS
Java development
Incident management
Distributed web applications
Automating tasks
Data structures understanding
Mentoring
Identifying improvements
Outils
Ansible
Terraform
Python
ELK
Prometheus
Grafana
Description du poste
We are looking for an experienced Site Reliability Engineer or Platform Operations Engineer for our client. This is a permanent position that is remote to start with later relocation to CalgaryorWinnipeg. Our client is a global enterprise company with a product that you've likely used.
You Will:
Own development projects, providing technical guidance and delivering against the Platform & Service Operations Engineering roadmap.
Designing and Implementing Wargames to test our operational response and identify areas of weakness in our platforms.
Technical and Management Escalation point for Service Operations Centre (SOC) engineers and during major incidents.
Troubleshooting, reproducing and mitigating issues in our production environments
Mentoring other team members.
Operate global AWS Platforms at scale
You Have:
Evidence of Strong Troubleshooting, problem-solving and investigative skills
Experience of AWS or Other cloud providers
Experience developing in Java
Major incident management on experience operating production platforms at scale
Experience working with distributed web applications
Experience Automating operational tasks / Processes using other languages
Understanding of relational and/or NoSQL data structures
Experience mentoring/influencing peers
Identifying improvements, highlighting risks vs benefits, and translating them into technical requirements
Bonus:
Worked with Ansible, Terraform, Python
Experience working with Serverless / Containers
Experience of ELK &/Or Graphite/Prometheus / Grafana
Used Tracing Tools in production before
Experience in Chaos Engineering / Failure Injection Testing
Experience of working in an Agile Environment
Experience working in a similar site reliability role
This role offers great perks and a competitive salary, please apply to the job posting if it matches your career path!
* Le salaire de référence se base sur les salaires cibles des leaders du marché dans leurs secteurs correspondants. Il vise à servir de guide pour aider les membres Premium à évaluer les postes vacants et contribuer aux négociations salariales. Le salaire de référence n’est pas fourni directement par l’entreprise et peut pourrait être beaucoup plus élevé ou plus bas.