Enable job alerts via email!

System Administrator- High Performance Computing (HPC)

J&M Group

Ottawa

On-site

CAD 56,000 - 109,000

Full time

14 days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading company in IT Services is seeking a System Administrator for High Performance Computing (HPC) in Ottawa. The successful candidate will manage HPC environments, support users, and implement solutions in a dynamic setting. This role requires strong problem-solving skills and experience with Linux and HPC tools, offering a competitive salary and opportunities for growth.

Qualifications

  • In-depth experience in Linux platforms (Ubuntu/RedHat).
  • Experience with HPC tools such as Slurm and LSF.
  • Ability to manage multiple demands in a fast-paced environment.

Responsibilities

  • Manage day-to-day operations and support of the HPC environment.
  • Implement and manage a system for patching and cluster management.
  • Provide emergency support on incidents as required.

Skills

Problem-solving
Analytical skills
Communication
Scripting
Interpersonal skills

Education

Bachelor's degree in Computer Science or related field

Tools

Linux
Slurm
KVM
Active Directory

Job description

System Administrator- High Performance Computing (HPC)

Join to apply for the System Administrator- High Performance Computing (HPC) role at J&M Group

System Administrator- High Performance Computing (HPC)

1 day ago Be among the first 25 applicants

Join to apply for the System Administrator- High Performance Computing (HPC) role at J&M Group

  • Identify, diagnose, and resolve level two problems for users of the software and hardware, LAN and WAN, VPN, the Internet, mobile devices, and new computer technology; communicate solutions to end-users.
  • Respond to more complex issues (second line support) escalated by the first line support using problem-solving skills and analysis to identify root causes of issues, determine course of action and propose creative solutions.
  • Manage day-day operations and support of the HPC environment (Linux).
  • Take ownership of capacity, availability and performance of the HPC cluster(s).
  • Support end users in the submission and management of jobs based on Slurm and OpenHPC.
  • Migrate existing nodes as required to Linux.
  • Implement and manage a system based on Foreman or similar to manage patching and oversee cluster management.
  • Implement patches and upgrades to Linux, Slurm and OpenHPC as required.
  • Install new servers and storage, build new clusters, configure and manage Linux distributions, hypervisors (KVM) and tooling.
  • Automate where possible to increase efficiency of operations.
  • Execute upon firewall access requests to the environment.
  • Escalate priority support issues to senior staff and/or other corporate technology groups
  • Collect and document all relevant information prior to escalation to allow senior staff to operate efficiently
  • Document, track and monitor problems to ensure timely resolution.
  • Assist in tracking helpdesk calls pertaining to application, networking, and systems problems and issues.
  • Assign username, password and access right permissions for multiple proprietary applications, as well as client software.
  • Identity Management and multifactor authentication with integration between Active Directory and Linux platforms.
  • Perform hardware & software audits.
  • Product research and evaluation.
  • Provide emergency support on incidents as required.
  • Perform occasional after-hours maintenance.
  • Incident on-call rotation as required.
  • Day-to-day operational support.

Job Description

Main Responsibilities

  • Identify, diagnose, and resolve level two problems for users of the software and hardware, LAN and WAN, VPN, the Internet, mobile devices, and new computer technology; communicate solutions to end-users.
  • Respond to more complex issues (second line support) escalated by the first line support using problem-solving skills and analysis to identify root causes of issues, determine course of action and propose creative solutions.
  • Manage day-day operations and support of the HPC environment (Linux).
  • Take ownership of capacity, availability and performance of the HPC cluster(s).
  • Support end users in the submission and management of jobs based on Slurm and OpenHPC.
  • Migrate existing nodes as required to Linux.
  • Implement and manage a system based on Foreman or similar to manage patching and oversee cluster management.
  • Implement patches and upgrades to Linux, Slurm and OpenHPC as required.
  • Install new servers and storage, build new clusters, configure and manage Linux distributions, hypervisors (KVM) and tooling.
  • Automate where possible to increase efficiency of operations.
  • Execute upon firewall access requests to the environment.
  • Escalate priority support issues to senior staff and/or other corporate technology groups
  • Collect and document all relevant information prior to escalation to allow senior staff to operate efficiently
  • Document, track and monitor problems to ensure timely resolution.
  • Assist in tracking helpdesk calls pertaining to application, networking, and systems problems and issues.
  • Assign username, password and access right permissions for multiple proprietary applications, as well as client software.
  • Identity Management and multifactor authentication with integration between Active Directory and Linux platforms.
  • Perform hardware & software audits.
  • Product research and evaluation.
  • Provide emergency support on incidents as required.
  • Perform occasional after-hours maintenance.
  • Incident on-call rotation as required.
  • Day-to-day operational support.

Specialized Skills, Knowledge & Abilities

  • In-depth and demonstrated experience in the installation and operation of Linux platforms in an Enterprise environment (Ubuntu/RedHat).
  • Experience in the use of KVM or other hypervisors.
  • Experience in HPC tools such as Slurm, LSF or GridEngine.
  • Demonstrated knowledge of HPC clusters and use cases.
  • Working technical knowledge of network systems.
  • Working technical knowledge of current systems software, protocols and standards including Active Directory.
  • Identity management using Microsoft Identity Manager and Azure AD Connect.
  • Solid understanding of the Windows based endpoints.
  • Solid scripting experience (e.g. Bash)
  • Excellent written and oral communication skills.
  • Excellent problem-solving skills.
  • Strong analytical and troubleshooting skills
  • Strong interpersonal and organizational skills.
  • Must be well organized and able to grasp system concepts and communicate their applications.
  • Must be capable of quickly learning new systems and associated software applications for proficient execution of tasks.
  • Ability to manage multiple demands with time related constraints in a fast-paced environment.
  • Prioritize and schedule work as necessary to maintain department standards and service level agreements
  • Ability to speak effectively before groups of internal employees, communicate technical information, create and deliver presentations and information sessions to both technical and nontechnical personnel.
  • Demonstrated experience in applying technical expertise and in-depth evaluation to solve complex problems in own area of expertise.
  • Ability to create and maintain documentation and training materials, including KB articles, for technical staff and end-user audiences.
  • Microsoft Windows experience is an asset.
  • Bilingualism (English/French) is an asset.
Seniority level
  • Seniority level
    Entry level
Employment type
  • Employment type
    Contract
Job function
  • Job function
    Information Technology
  • Industries
    IT Services and IT Consulting

Referrals increase your chances of interviewing at J&M Group by 2x

Get notified about new System Administrator jobs in Ottawa, Ontario, Canada.

Information Technology and Operations Administrator
System Administrator – VMware & Automation Specialist (32373)
High Performance Computing HPC Administrator (32447)
Identity and Access Management Specialist
ServiceNow Functional/Technical Consultant - Elevate Program 2025
ServiceNow Functional Consultant and Technical Consultant
ServiceNow Functional and Technical Senior Consultant
Technical Integration Specialist - cCure & Genetec Experience

Gatineau, Quebec, Canada CA$56,082.00-CA$108,008.00 4 days ago

Proposal Library Administrator - SharePoint & AI Integration

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Senior Systems Administrator - Remote British Columbia

Emil Anderson Group

Chilliwack

Remote

CAD 85,000 - 115,000

Today
Be an early applicant

Senior Systems Administrator - Remote British Columbia

Emil Anderson Construction

Chilliwack

Remote

CAD 70,000 - 100,000

Yesterday
Be an early applicant

Systems Administrator

VersaPay

Remote

CAD 70,000 - 80,000

Yesterday
Be an early applicant

Senior Systems Administrator

Targeted Talent

Guelph

Remote

CAD 90,000 - 120,000

2 days ago
Be an early applicant

System Administrator (2105)

Convergence Networks

Ottawa

On-site

CAD 55,000 - 75,000

2 days ago
Be an early applicant

System Administrator

Canadian Bank Note Company, Limited

Ottawa

On-site

CAD 70,000 - 90,000

4 days ago
Be an early applicant

Cloud System Administrator, Level 3

Systematix group

Ottawa

On-site

CAD 100,000 - 140,000

Yesterday
Be an early applicant

Senior System Administrator - LAMP Stack

Software International

Toronto

Remote

CAD 60,000 - 62,000

10 days ago

ADMINISTRATEUR SYSTÈMES SPÉCIALISTE INTUNE Montréal (télétravail) 2024-12-02

Gravity Conseil

Montreal

Remote

CAD 70,000 - 95,000

10 days ago