Enable job alerts via email!

Site Reliability Engineering Specialist

Telesat Corporation

Ottawa

Hybrid

CAD 100,000 - 125,000

Full time

29 days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An established industry player is on the lookout for a Site Reliability Engineering Specialist to enhance the reliability and performance of their infrastructure. This role requires a proactive individual with extensive experience in cloud environments, particularly with Kubernetes and Microsoft Azure. The successful candidate will be responsible for automating operational tasks, monitoring system health, and collaborating with development teams to ensure uptime and performance. Join a forward-thinking company that values innovation and offers a dynamic work environment where your contributions will make a significant impact on global communications solutions.

Qualifications

  • 9+ years of IT operations experience with a focus on reliability and performance.
  • 5+ years of hands-on experience with Microsoft Azure and Kubernetes.

Responsibilities

  • Maintain high availability and resiliency of Kubernetes-based infrastructure.
  • Automate operational tasks and monitor platform health.

Skills

Cloud Environments
Automation
Incident Response
System Optimization
Problem-Solving

Education

Bachelor's Degree in Computer Science

Tools

Kubernetes
Terraform
Ansible
Prometheus
Grafana
Nagios
Splunk

Job description

Telesat (NASDAQ and TSX: TSAT) is a leading global satellite operator, providing reliable and secure satellite-delivered communications solutions worldwide to broadcast, telecommunications, corporate and government customers for over 50 years. Backed by a legacy of engineering excellence, reliability and industry-leading customer service, Telesat has grown to be one of the largest and most successful global satellite operators.

Telesat Lightspeed, our revolutionary Low Earth Orbit (LEO) satellite network, scheduled to begin service in 2027, will revolutionize global broadband connectivity for enterprise users by delivering a combination of high capacity, security, resiliency and affordability with ultra-low latency and fiber-like speeds. Telesat is headquartered in Ottawa, Canada, and has offices and facilities around the world.

The company’s state-of-the-art fleet consists of 14 GEO satellites, the Canadian payload on ViaSat-1 and one LEO 3 demonstration satellite. For more information, follow Telesat on X and LinkedIn or visit www.telesat.com.

Position Overview

We are seeking a Site Reliability Engineering Specialist to ensure the reliability, performance, and scalability of our infrastructure. The ideal candidate will have extensive experience in cloud environments, automation, and monitoring, with a strong focus on incident response and system optimization. Excellent problem-solving skills and a proactive approach to maintaining system health are essential.

Responsibilities
  • Work closely with Telesat's cloud engineers to deploy and maintain our Kubernetes-based infrastructure.
  • Help maintain high availability, uptime and resiliency of our infrastructure.
  • Perform day-to-day operational tasks such as upgrades and patching of the Kubernetes platform.
  • Automate operational tasks.
  • Monitor the health of the platform and applications using Telesat's observability platform.
  • Improve observability, define and measure SLOs.
  • Collaborate with development teams to resolve application issues.
  • Go on-call and respond to automated alerts and execute playbooks.
  • Identify gaps in processes, as well as build or improve tools to support incident management.
  • Facilitate incident response and conduct root cause analysis.
Education and Experience Required:
  • Bachelor's Degree in Computer Science or a related field.
  • Minimum nine years of experience in IT operations with a focus on reliability, uptime, availability and performance.
  • At least five years of hands-on provable experience with Microsoft Azure including deployment, management, and monitoring.
  • Expertise in automation and configuration management tools with demonstrable experience using tools such as Terraform and Ansible to automate infrastructure and application deployment.
  • Strong understanding of monitoring and observability tools with proven experience in monitoring tools such as Prometheus, Grafana, Nagios, or Splunk, and the ability to implement and maintain observability solutions.
  • CNCF Certified Kubernetes Administrator (CKA) would be considered an asset for this role.

At Telesat, we take pride in being an equal opportunity employer that values equality in the workplace. We are committed to providing the best candidate experience possible including any required accommodations at every stage of our interview process. All qualified applicants that have been selected for an interview that require accommodations, are advised to inform the Telesat Talent team accordingly. We will work with you to meet your needs. All accommodation information provided will be treated as confidential.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Lead, Site Reliability Engineering, Infrastructure Security

MongoDB

Montreal

Remote

CAD 100,000 - 125,000

30+ days ago

Lead, Site Reliability Engineering, Infrastructure Security

MongoDB

Old Toronto

Remote

CAD 90,000 - 150,000

30+ days ago

Lead, Site Reliability Engineering, Infrastructure Security Toronto

MongoDB

Old Toronto

Remote

CAD 90,000 - 150,000

30+ days ago