Enable job alerts via email!

Senior Site Reliability Engineer

Orion Health group of companies

Toronto

On-site

CAD 80,000 - 110,000

Full time

3 days ago
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

Orion Health is seeking a Site Reliability Engineer to enhance its managed services unit in Toronto. The role focuses on optimizing system performance, automating infrastructure tasks, and ensuring reliability in cloud-hosted environments. Ideal candidates possess strong technical skills, proactive improvement mindset, and a Bachelor’s degree, contributing to revolutionizing global health systems.

Qualifications

  • 4–6 years in a Site Reliability Engineering or equivalent role.
  • Experience in supporting cloud-based production systems.
  • Strong scripting background with experience in object-oriented programming.

Responsibilities

  • Collaborate in the construction of the automation for infrastructure and software delivery.
  • Participate in the daily management of multiple Orion Health solutions hosted in AWS Cloud.
  • Document procedures, and processes to facilitate learning and knowledge transfer.

Skills

Proactive approach
Software engineering principles
Cloud technologies
Scripting automation
Networking

Education

Bachelor’s Degree in a technical discipline
Technical certification in System Administration, Cloud Engineering, or DevOps

Tools

PowerShell
Python
Bash
Puppet
Ansible
Kubernetes
CloudFormation
Terraform
Splunk monitoring tools

Job description

Do you want to work for a company that is innovating and making a difference to the health and wellbeing of people all over the world? We’re not about selling meaningless, unnecessary products for corporate profitability. You’ll be working on technology that will revolutionise global health systems so that we can finally get the healthcare we all want - a basic human right.

We like to think of ourselves as a community of start-ups where you can be your true, genuine self. Each of our product teams has the autonomy to decide how they operate and contribute towards our mission of providing each person with the right care at the right time and in the right place.

Orion Health is excited to be expanding our galaxy by recruiting for a number of stellar individuals to join our team to help us deliver to our global customer base. If you want to climb aboard the rocketship and help us revolutionise global health systems, astronomical opportunities await.

Position Purpose :

Collaborate in the construction of the automation for infrastructure and software delivery, and being the primary executor of such processes, collecting feedback from the support of operational sites. Responsible for availability, latency, reliability, performance, efficiency, change management, monitoring, emergency response, improve system availability and capacity planning.

Success in this Role looks like…

  • Through a proactive approach, relentless improvement and constant training, the SREs run the customer environments by monitoring availability and taking a holistic view of system health
  • SLAs are always met through automation with none to small involvement from the team, and the number of customers and provided services can scale without correlation with the size of the team
  • Bridge the gap between development and operations
  • Well built software and systems to manage platform infrastructure and applications
  • Measured and optimised system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve

Business Unit :

North American Managed Services

  • This unit contributes to Orion Health’s purpose to enabling client success by introducing and maintaining managed environments, policies and procedures in line with ITIL aligned standards and maintain focus on all elements of support for our customers

Key Relationships : Internal

  • Technical Operations Leads, Implementation Consultants, Solution Architects, Service Management Lead, Service Operations, and Product Team, Database administrators
  • SREs must have constant communication with the Development Team and Technical Leads (Software Designers) to understand the concrete requirements of the products and the configuration required to be part of the automation

External

  • Client, and Third Party vendors

Essential functions :

Operations Support and Issue Resolution

Participate in the daily management of multiple Orion Health solutions hosted in AWS Cloud, Infrastructure and Networking including but not limited to :

  • Daily monitoring and alert responses, identify potential problems, and implement alerts to notify relevant parties
  • Following a Change Request from creation to completion, providing review, detail validation and execution of all tasks
  • Work with other teams to ensure a smooth and reliable releases
  • Tuning the Application stack to improve stability and resultant uptime metrics
  • Automate repetitive tasks, such as development, scaling, and patching, to improve efficiency and reduce manual effort.
  • Acute and Recurring issue investigation and resolution
  • Performance Trend Analysis, identifying and address performance bottlenecks to ensure system can handle expected loads and user traffic
  • Log Analysis and Error resolution
  • Manage and maintain the underlying infrastructure, including servers, and networks, to ensure smooth operations
  • Handover Testing
  • Document procedures, and processes to facilitate learning and knowledge transfer within the team
  • Root Cause Analysis; involved in investigating and resolving incidents, including outages and performance problems, to minimize disruption
  • Plan for future capacity needs to ensure systems can and handle increasing demand
  • Developing and testing disaster recovery plans to guarantee data integrity, system resilience, and swift restoration of services in case of critical incidents.
  • Coordinate with teams to maintain Service Level Agreements

Internal Development

Responsible for the Continuous Integration of updates for over 10 Products / solutions released by Development teams into Orion Health solutions.

Build secure and scalable infrastructure to manage customer data

Internal Support

  • Participate in On-Call RotationWork with Development, Solution Adoption, Managed Services, Professional Services, Support and other teams to provide clients with a world class stable solution platform

Behavioural and Technical Capabilities

  • Highly proactive and motivated Software / System Engineer, always seeking opportunities for improvement and taking ownership of the challenges
  • Strong understanding of software engineering principles, operating systems (Windows and Linux), networking, and cloud technologies
  • Experienced in Windows and Linux OS administration, with hands-on exposure to DataCenter operations
  • Proficient in Active Directory, Group Policy Object (GPO) management, DNS, and Active Directory service health monitoring
  • Demonstrated scripting and automation experience using PowerShell, Python, Bash, and other languages
  • Familiarity with infrastructure automation tools such as Puppet and Ansible is a plus
  • Capable of communicating ideas and collaborating productively across technical teams
  • Committed to continuous learning and knowledge sharing within the team
  • Ability to design secure distributed web services and manage network security at scale
  • Solid understanding of TCP / IP, DNS, DHCP, VLANs, VPNs, firewall configuration, Load Balancers, and other network appliances
  • 4–6 years in a Site Reliability Engineering or equivalent role
  • 5 years in systems / application support and / or development
  • Strong scripting background with experience in object-oriented and structured programming
  • Experience with automation, infrastructure as code, and orchestration (e.g., Puppet, Ansible, Kubernetes, CloudFormation, Terraform)
  • Exposure to on-prem to AWS cloud migration projects and Red Hat OS upgrades is an asset
  • Working knowledge of Splunk monitoring tools and strategies
  • Strong foundation in Network Architecture and Security
  • Experience with CI / CD pipelines and deployment automation in cloud environments (AWS preferred)

Education & Qualifications :

  • Bachelor’s Degree in a technical discipline or equivalent experience
  • Experience in supporting cloud-based production systems
  • A technical certification in System Administration, Cloud Engineering, or DevOps
  • Formal training and certification in
  • nix scripting, non-SQL / SQL, Oracle databases and Big Data technologies and AWS Cloud Services is a plus

J-18808-Ljbffr

Create a job alert for this search

Site Reliability Engineer • Toronto, ON, Canada

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Senior Site Reliability Engineer

Canonical

Toronto

Remote

CAD 100,000 - 150,000

27 days ago

Senior Turbine Reliability Engineer

Ctrl

Toronto

Remote

CAD 80,000 - 110,000

6 days ago
Be an early applicant

Senior Site Reliability Engineer - Remote

Kablamo Pty Ltd

Toronto

Remote

CAD 100,000 - 130,000

30+ days ago

Senior Site Reliability Engineer

Air-tek

Toronto

Hybrid

CAD 80,000 - 120,000

4 days ago
Be an early applicant

Senior Site Reliability Engineer

CentML

Toronto

On-site

CAD 100,000 - 140,000

3 days ago
Be an early applicant

Senior Site Reliability Engineer

Vantage

Toronto

On-site

CAD 100,000 - 150,000

4 days ago
Be an early applicant

Senior Site Reliability Engineer AWS, Monitoring tools Rqd

Thomas Reuters

Toronto

Hybrid

CAD 100,000 - 130,000

Yesterday
Be an early applicant

Senior Site Reliability Engineer AWS, Monitoring tool Rqdd

Thomas Reuters

Toronto

Hybrid

CAD 90,000 - 130,000

7 days ago
Be an early applicant

Senior Reliability Engineer

Flinks Technology Inc.

Toronto

On-site

CAD 80,000 - 110,000

2 days ago
Be an early applicant