Enable job alerts via email!

Cloud Site Reliability Engineer

Smile Digital Health

Canada

Remote

CAD 100,000 - 120,000

Full time

Today
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading health tech provider in Canada is seeking a Cloud Site Reliability Engineer to ensure the reliability and performance of cloud services. This role involves automating performance testing, collaborating closely with engineering teams, and adhering to strict SLAs. Ideal candidates will have expertise in cloud service management, strong troubleshooting skills, and familiarity with infrastructure automation tools. Offers competitive salary and a remote work environment.

Benefits

Remote Work Environment
Flexible Time Away From Work Policy
Competitive Salary and Health/Medical Benefits
RRSP/TFSA/401K Employee Contribution
Life and Disability
Employee Assistance Program
FHIR Study Program and Skillsoft Learning
Super HAPI Fun Club

Qualifications

  • Demonstrated expertise of cloud service providers, preferably managing Azure.
  • Experience working with microservices architecture focused on Java-based services.
  • Skilled in troubleshooting performance and resource allocation.

Responsibilities

  • Ensure reliability and scalability of services across cloud platforms.
  • Collaborate with security teams for best practices.
  • Design performance testing frameworks and maintain infrastructure.

Skills

Cloud service providers expertise
Troubleshooting performance issues
Infrastructure as code, automation
Familiarity with performance testing methodologies
Proficiency in Terraform
Experience with Kubernetes

Tools

Terraform
Ansible
Grafana
Elastic stack
Job description

Working for a company like Smile Digital Health means supporting our mandate for #BetterGlobalHealth. We strive towards this goal every day, and the results can be seen in the impact of our innovative health data platform and data management solutions, which are used in over 20 countries. We were #19 on Deloitte's Technology Fast 50 Ranking for 2024!

Smile Digital Health makes it easy for healthcare stakeholders to collect and exchange data with our leading FHIR-based data liberation platform.

At its heart, the Smile platform enables people and organizations to better manage healthcare data. We help generate and liberate structured healthcare data to ensure effective delivery across care teams and health systems bringing #BetterGlobalHealth to patients everyday!

Apply today and find plenty of reasons to SMILE!

The Cloud Site Reliability Engineer (SRE) is responsible for ensuring the reliability, scalability, and performance of production‑grade services deployed across multiple cloud vendors and infrastructure platforms for Smile Digital Health, its clients, and partners. This role designs and automates performance testing frameworks, integrates them into CI/CD pipelines, and uses observability tools to proactively detect and resolve bottlenecks. Working closely with engineering, product, and security teams, the SRE ensures systems meet strict SLAs for performance and availability while driving continuous optimization across multiple cloud platforms.

Responsibilities:
  • Collaborate with our Security Operations teams to help define and implement best practices around Cloud Service Provider configuration for Azure and other cloud providers.
  • Develop, implement and coordinate a multi-tenant approach around service offerings for DB, Container platform, Authentication, Certificates, and Product Registries etc.
  • Design and maintain performance testing strategies, framework, and environments in the cloud. Develop and maintain cost/utilization tracking and attribution processes for all Cloud Service Providers.
  • Create documentation around Cloud Service Provider offerings detailing use cases, best practices, and implementation details.
  • Develop and maintain technical relationships with our core Cloud Service Providers.
  • Implement and maintain a secure and scalable infrastructure platform for delivering Cloud Services applications.
  • Ensure that internal and external SLA’s meet and exceed expectations, and ensure that system centric KPIs are continuously monitored and improved.
  • Create tools for automating deployment, monitoring and operations of the overall platform.
  • Participate in an on‑call rotation to provide application support, incident management, and troubleshooting.
  • Provide ongoing maintenance and support of internal tools, improve system health and reliability.
  • Assist customers with the on‑site deployments when needed.
  • Implement and manage observability tools (logging, metrics, tracing) for performance insights, Otel and Grafana Stack preferred. Ongoing compliance with organizational policies, procedures and practices (such as but not limited to security policies) are an ongoing requirement of the employment or contractual agreement.
  • Accountable for ensuring that all working hours are accurately reported in Time Tracking System on a daily or weekly basis, that the majority of (if not all) hours are tracked as billable and that the project management tool in the time tracking system is properly and fully utilized.
  • Tracking and reporting of billable hours is a critical aspect of project management and delivery to our customers and this is a major area of accountability.
  • Comply with the privacy, security and confidentiality policies. Hold all confidential information in trust and strict confidence and ensure that it shall be used only for the purposes required to fulfill employment obligations, and shall not be used for any other purpose, or disclosed to any third party.
Requirements:
  • Demonstrated expertise of cloud service providers and best practices around implementation and configuration, preferably managing Azure on behalf of multiple teams for a company that delivers SaaS products.
  • Experience with Kubernetes, Openshift, Kafka, Elastic stack. Proven experience working with microservices architecture, with a strong focus on Java‑based services.
  • Experience in applying chaos engineering practices to evaluate and enhance system resiliency.
  • Skilled in troubleshooting performance issues, including analyzing time consumption, allocating resources, and recommending optimizations.
  • Familiar with performance testing methodologies and tools to assess system behavior under load.
  • Proven experience with Security and Compliance (SOC2, HIPAA, ISO27001) best practices and how to implement controls that support high‑velocity software delivery teams.
  • Proficiency in Terraform, Ansible or Chef. Expertise in troubleshooting, support escalation, on‑call process optimization and documenting knowledge.
  • Passionate about Infrastructure as code, automation, and developing solutions that help developers move quickly and safely.
  • Familiarity with infrastructure management and operations lifecycle concepts and ecosystem.
  • Experience operating and maintaining production systems in a Linux and public cloud environment.
  • You have prior experience working in high‑performance or distributed systems, while we strive to hire at a variety of experience levels.
  • Working knowledge of industry best practices regarding information security. Previous experience building or maintaining a large‑scale Cloud service.
  • Proven ability to prioritize and track multiple projects in parallel. Proven ability to be highly responsive and customer‑focused.

$100,000 - $120,000 a year

Some of the benefits we offer:
  • Remote Work Environment
  • Flexible Time Away From Work Policy including PTO, Personal and Sick Days
  • Competitive Salary and Health/Medical Benefits
  • RRSP/TFSA/401K Employee Contribution
  • Life and Disability
  • Employee Assistance Program
  • FHIR Study Program and Skillsoft Learning
  • Super HAPI Fun Club

Smile's core values include respect, inclusion, embracing our differences, and celebrating shared values because our people are the foundation of our success. We are big on creating a sense of belonging and empowering each other to bring our authentic selves to work. We are dedicated to fostering a workplace that values diversity, equity, and inclusion.

We welcome and encourage candidates of all backgrounds to apply. Candidates are encouraged to inform us if they wish to discuss or require accommodations during interviews or while working at Smile.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.