Enable job alerts via email!

Site Reliability Engineer

E-Solutions

Toronto

On-site

CAD 85,000 - 115,000

Part time

Today
Be an early applicant

Job summary

A technology solutions company in Toronto is seeking a Site Reliability Engineer. The role involves providing hands-on support, developing SRE solutions, and ensuring compliance. Candidates should have a Bachelor's degree in a relevant field along with experience in SRE and programming skills in Python and Azure. This position offers a contract duration and requires availability for on-call duties.

Qualifications

  • Advanced knowledge in Programming & Scripting.
  • Experience in SRE or related fields required.
  • Hands-on experience with monitoring and observability tools needed.

Responsibilities

  • Provide hands-on SRE support including incident management.
  • Develop SRE solutions such as monitoring and alerting systems.
  • Ensure compliance including segregation of duties.

Skills

Programming & Scripting: Python
Cloud & OS: Azure
Monitoring & Observability: Dynatrace
Automation Tools: Ansible
Linux

Education

Bachelor's degree in Computer Science, Engineering, Mathematics, Physics

Tools

Dynatrace
Kafka
Shell scripting
Azure Monitor
Chaos Engineering
Job description
Role

Site Reliability Engineer

Location

Toronto, ON

Duration

Contract

Responsibilities
  • Site Reliability Engineering (SRE): Provide hands-on SRE support, including incident management, problem management, root cause analysis (RCA), monitoring, alerting, and infrastructure maintenance.
  • Track, audit, monitor, and implement technical work streams.
  • Act as Portfolio SME (Subject Matter Expert) to document common components, core functionalities, and infrastructure of supported applications.
  • Serve as an escalation point in on-call rotation; support maintenance, scheduled work, and release deployment requirements.
  • Lead incident and problem management for applications in scope and ensure RCA action items are fulfilled.
  • Drive continuous improvement, technical standards, and automation opportunities in monitoring, tooling, and productivity.
  • Manage technology currency, including server patching, certificate renewal, and compliance.
  • Research and implement best-in-class technical solutions relevant to RBC environment and needs.
  • Collaborate with unit, department, and enterprise teams to develop cross-enterprise solutions.
  • Engineering: Develop SRE solutions such as monitoring and alerting systems, machine learning anomaly detection, self-healing, and reliability testing.
  • Apply design-thinking and agile practices alongside SREs, Scrum Masters, and Incident Leads.
  • Contribute to and leverage SRE best practices.
  • Simplify development by building repeatable solutions to manual tasks.
  • Promote adoption of automation solutions for applications in scope.
  • Production Support: Perform production support roles, including off-hours support and rotational on-call duties.
  • Assist in incident and problem management for applications in scope.
  • Evaluate and improve processes to prevent future issues.
  • Ensure availability and uptime of applications as per Service Level Objectives (SLOs).
  • Ensure compliance, including segregation of duties.
  • Technical Consultation: Provide guidance for initiatives beyond the application or squad level.
  • Consult on product builds for other teams within RBPT and enterprise-wide.
  • Innovation and Learning: Stay updated on technology changes through formal training and self-learning.
  • Demonstrate new technology findings via team demos.
Must-Have Qualifications
  • Bachelor's degree in Computer Science, Engineering, Mathematics, Physics, or equivalent practical experience.
  • years of experience in SRE or related fields.
  • Advanced knowledge and hands-on experience with: Programming & Scripting: Python, YAML, Shell scripting; Cloud & OS: Azure, Linux; Monitoring & Observability: Dynatrace, Prometheus, PagerDuty, Moog, Splunk, Elastic, Azure Monitor; Reliability Practices: Chaos Engineering; Messaging Systems: MQ, Kafka; Automation Tools: Ansible, Azure Automation, Catchpoint; Production support including off-hours and on-call rotations
Additional Experience (Less Than Year)
  • Dynatrace
  • Kafka
  • Network programming (Perl, Python, Java, etc.)
  • Microsoft Azure
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.