Enable job alerts via email!

Team Lead, HPC System Operations

Telesat

Ottawa

On-site

CAD 120,000 - 200,000

Full time

4 days ago
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading global satellite operator seeks a Team Lead for HPC System Operations in Ottawa. The role focuses on the technical leadership and support for high-performance compute environments, requiring extensive experience in Linux systems and HPC tools. Join Telesat to drive innovation in global satellite communications with a collaborative team dedicated to engineering excellence.

Benefits

Equal opportunity employer
Promotes a collaborative work environment

Qualifications

  • Minimum of 5 years in IT or 7 years in IT with a College Diploma.
  • Industry certifications like MCSE or CISSP are strong assets.
  • In-depth experience with Linux platforms in Enterprise environments.

Responsibilities

  • Manage daily operations and support of the HPC environment.
  • Identify, diagnose, and resolve software and hardware issues.
  • Implement patches and upgrades to systems as necessary.

Skills

Problem-solving
Analytical skills
Interpersonal skills
Scripting (Bash)
Technical communication

Education

Diploma or Degree in Computer Science

Tools

Linux (Ubuntu/RedHat)
Slurm
OpenHPC
KVM or other hypervisors

Job description

Join to apply for the Team Lead, HPC System Operations role at Telesat

1 week ago Be among the first 25 applicants

Join to apply for the Team Lead, HPC System Operations role at Telesat

Telesat (NASDAQ and TSX: TSAT) is a leading global satellite operator, providing reliable and secure satellite-delivered communications solutions worldwide to broadcast, telecommunications, corporate and government customers for over 50 years. Backed by a legacy of engineering excellence, reliability and industry-leading customer service, Telesat has grown to be one of the largest and most successful global satellite operators.

Telesat Lightspeed, our revolutionary Low Earth Orbit (LEO) satellite network, scheduled to begin service in 2027, will revolutionize global broadband connectivity for enterprise users by delivering a combination of high capacity, security, resiliency and affordability with ultra-low latency and fiber-like speeds. Telesat is headquartered in Ottawa, Canada, and has offices and facilities around the world.

The company’s state-of-the-art fleet consists of 14 GEO satellites, the Canadian payload on ViaSat-1 and one LEO 3 demonstration satellite. For more information, follow Telesat on X and LinkedIn or visit www.telesat.com

Reporting to the Manager, Network and Telecom, the incumbent provides the technical leadership and specialist expertise required for the operation and support of the Constellation Management System/System Model running within a high-performance compute environment (HPC). The candidate’s primary focus is to monitor, maintain, troubleshoot and support HPC nodes which are integral to the day-to-day operation of the company. These activities include managing the hardware, software installations and configuration, optimization and management of the environment. Other activities include operational, day-to-day requests including migration of nodes, system access requests, resolution of security alerts, and providing second level problem assessment, triage, research, and resolution of incidents and requests, and capable of applying technical expertise at a superior level. Assist with the creation and publication of end user documentation as new technology is released and systems are migrated.

Responsibilities

  • Identify, diagnose, and resolve level two problems for users of the software and hardware, LAN and WAN, VPN, the Internet, mobile devices, and new computer technology; communicate solutions to end-users.
  • Respond to more complex issues (second line support) escalated by the first line support using problem-solving skills and analysis to identify root causes of issues, determine course of action and propose creative solutions.
  • Manage day-day operations and support of the HPC environment (Linux).
  • Take ownership of capacity, availability and performance of the HPC cluster(s).
  • Support end users in the submission and management of jobs based on Slurm and OpenHPC.
  • Migrate existing nodes as required to Linux.
  • Implement and manage a system based on Foreman or similar to manage patching and oversee cluster management.
  • Implement patches and upgrades to Linux, Slurm and OpenHPC as required.
  • Install new servers and storage, build new clusters, configure and manage Linux distributions, hypervisors (KVM) and tooling.
  • Automate where possible to increase efficiency of operations.
  • Execute upon firewall access requests to the environment.
  • Escalate priority support issues to senior staff and/or other corporate technology groups
  • Collect and document all relevant information prior to escalation to allow senior staff to operate efficiently
  • Document, track and monitor problems to ensure timely resolution.
  • Assist in tracking helpdesk calls pertaining to application, networking, and systems problems and issues.
  • Assign username, password and access right permissions for multiple proprietary applications, as well as client software.
  • Identity Management and multifactor authentication with integration between Active Directory and Linux platforms.
  • Perform hardware & software audits.
  • Product research and evaluation.
  • Provide emergency support on incidents as required.
  • Perform occasional after-hours maintenance.
  • Incident on-call rotation as required.
  • Day-to-day operational support.

Education & Experience Required

  • A Diploma or Degree in a relevant area of study with a preference for Computer Science together with demonstrated operational network-related experience.
  • Minimum of 5 years in Information Technology (with a related University Degree) or minimum of 7 years in Information Technology (with a three-year College Diploma).
  • Industry certifications such as MCSE, CISSP are a strong asset.

Specialized Knowledge, Skills & Abilities

  • In-depth and demonstrated experience in the installation and operation of Linux platforms in an Enterprise environment (Ubuntu/RedHat).
  • Experience in the use of KVM or other hypervisors.
  • Experience in HPC tools such as Slurm, OpenHPC, LSF or GridEngine.
  • Demonstrated knowledge of HPC clusters and use cases.
  • Working technical knowledge of network systems.
  • Working technical knowledge of current systems software, protocols and standards including Active Directory.
  • Identity management using Microsoft Identity Manager and Azure AD Connect.
  • Solid understanding of the Windows based endpoints.
  • Solid scripting experience (e.g. Bash)
  • Excellent written and oral communication skills.
  • Excellent problem-solving skills.
  • Strong analytical and troubleshooting skills
  • Strong interpersonal and organizational skills.
  • Must be well organized and able to grasp system concepts and communicate their applications.
  • Must be capable of quickly learning new systems and associated software applications for proficient execution of tasks.
  • Ability to manage multiple demands with time related constraints in a fast-paced environment.
  • Prioritize and schedule work as necessary to maintain department standards and service level agreements
  • Ability to speak effectively before groups of internal employees, communicate technical information, create and deliver presentations and information sessions to both technical and nontechnical personnel.
  • Demonstrated experience in applying technical expertise and in-depth evaluation to solve complex problems in own area of expertise.
  • Ability to create and maintain documentation and training materials, including KB articles, for technical staff and end-user audiences.
  • Microsoft Windows experience is an asset.
  • Bilingualism (English/French) is an asset.

Working Conditions

  • Generally comfortable working conditions with lifting and onsite installations.
  • Moderate visual concentration in use of video display terminal.
  • Appropriate security clearances required.
  • Occasional off-hours support may be required.
  • Participation in group pager rotation (‘on call’).

At Telesat, we take pride in being an equal opportunity employer that values equality in the workplace. We are committed to providing the best candidate experience possible including any required accommodations at every stage of our interview process. All qualified applicants that have been selected for an interview that require accommodations, are advised to inform the Telesat Talent team accordingly. We will work with you to meet your needs. All accommodation information provided will be treated as confidential.

Seniority level
  • Seniority level
    Mid-Senior level
Employment type
  • Employment type
    Full-time
Job function
  • Job function
    Management and Manufacturing
  • Industries
    Telecommunications

Referrals increase your chances of interviewing at Telesat by 2x

Get notified about new Operations Team Lead jobs in Ottawa, Ontario, Canada.

Facility Operations Manager - Material Recovery Facility (MRF)
Team Lead (Full Time), Ottawa Train Yards
Team Lead, IT Risk & Security Operations

Ottawa, Ontario, Canada CA$200,000.00-CA$200,000.00 3 weeks ago

Customer Service Representative, Claims - Ottawa Auto Service Centre
Regional Manager (Health care operations, customer service)
Senior Manager, Risk Operations - Payments and Fraud
Intermediate Supervisor Manufacturing Operations - Abbott Point of Care (Ottawa)
Revenue Operations Manager- Future Opening (Remote Canada)

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.