Enable job alerts via email!

System Administrator

Turtle Island Staffing

Ontario

Remote

CAD 70,000 - 100,000

Full time

Yesterday
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading staffing agency is seeking a System Administrator with expertise in HPC clusters to work remotely. The role involves overseeing cluster operations, troubleshooting issues, and collaborating with scientists to meet their HPC needs. Ideal candidates will have strong Linux system administration skills and experience with job schedulers and applications.

Qualifications

  • Hands-on work with HPC clusters, including hardware and networking.
  • Experience in troubleshooting HPC environments.
  • Proficiency in building and troubleshooting applications.

Responsibilities

  • Oversee and maintain an HPC cluster, managing hardware and configurations.
  • Troubleshoot HPC environments to restore operations quickly.
  • Document processes and best practices for knowledge continuity.

Skills

HPC clusters
Troubleshooting
Linux OS
Documentation
Networking

Tools

Git
Ansible Playbooks
PBS Pro/Torque
SLURM
CUDA

Job description

Our client is seeking a System Administrator to join their team on a remote basis. To be considered for this position, you will need experience maintaining and troubleshooting HPC clusters. Keep on reading to learn more.

About you
To be considered, you will need:
  • Hands-on work with HPC clusters, including hardware, image management, local networking, and schedulers.
  • A strong background in troubleshooting HPC environments to resolve incidents efficiently.
  • The ability to assess scientists' HPC support needs and develop task plans accordingly.
  • Proficiency in building, installing, and troubleshooting applications (GNU, Intel, Fortran, Nvidia).
  • Familiarity with open-source and commercial software like Python, Anaconda, Bash scripts, EasyBuild, Spack, and MPI implementations (MPICH, OpenMPI, IntelMPI, HPMPI).
  • System administration skills for Linux OS, user account management, and configuration tools (Git, MS DevOps, Ansible Playbooks).
  • Knowledge of RPM/DEB packages, environment modules, and ThinLinc troubleshooting.
  • Expertise in job schedulers (PBS Pro/Torque, SLURM, SGE) and CUDA installations, including GPU troubleshooting.
  • Hardware management, including memory upgrades, storage arrays, power and network cabling.
  • Strong documentation skills to ensure knowledge continuity.
  • Secret-level security clearance (or eligibility to obtain it).
About the role
If hired, you will:
  • Oversee and maintain an HPC cluster, managing hardware, networking, and scheduler configurations.
  • Troubleshoot HPC environments to restore operations quickly in case of incidents.
  • Work with scientists to evaluate their HPC needs and develop task plans.
  • Install and support applications, resolve runtime issues, and assist with in-house software.
  • Manage Linux system operations, including patching, account management, and configuration via Git and Ansible.
  • Support and troubleshoot job schedulers and CUDA installations.
  • Handle hardware maintenance, including memory upgrades, storage management, and networking.
  • Document processes and best practices to ensure knowledge continuity.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

System Administrator - Bilingual Spanish

Compugen Inc

Rankin

Remote

CAD 70 000 - 90 000

7 days ago
Be an early applicant

HP NonStop Systems Administrator

Artech L.L.C.

Remote

CAD 80 000 - 125 000

Yesterday
Be an early applicant

Systems Administrator

NTT DATA North America

Halifax

Remote

CAD 70 000 - 100 000

Yesterday
Be an early applicant

Systems Administrator

Versapay

Remote

CAD 70 000 - 80 000

Today
Be an early applicant

Lower School Student Information System Administrator

Becker County

Oakville

On-site

CAD 80 000 - 100 000

Yesterday
Be an early applicant

Senior Systems Administrator, HIS

William Osler Health System

Brampton

On-site

CAD 85 000 - 108 000

5 days ago
Be an early applicant

Business Systems Administrator

eSentire

Ontario

Hybrid

CAD 70 000 - 90 000

5 days ago
Be an early applicant

Systems Administrator & Support Engineer

CMiC

Toronto

Hybrid

CAD 70 000 - 90 000

7 days ago
Be an early applicant

Senior System Administrator - LAMP

Software International

Old Toronto

Remote

USD 80 000 - 84 000

30+ days ago