System Administrator- High Performance Computing (HPC)
Join to apply for the System Administrator- High Performance Computing (HPC) role at J&M Group
System Administrator- High Performance Computing (HPC)
1 day ago Be among the first 25 applicants
Join to apply for the System Administrator- High Performance Computing (HPC) role at J&M Group
- Identify, diagnose, and resolve level two problems for users of the software and hardware, LAN and WAN, VPN, the Internet, mobile devices, and new computer technology; communicate solutions to end-users.
- Respond to more complex issues (second line support) escalated by the first line support using problem-solving skills and analysis to identify root causes of issues, determine course of action and propose creative solutions.
- Manage day-day operations and support of the HPC environment (Linux).
- Take ownership of capacity, availability and performance of the HPC cluster(s).
- Support end users in the submission and management of jobs based on Slurm and OpenHPC.
- Migrate existing nodes as required to Linux.
- Implement and manage a system based on Foreman or similar to manage patching and oversee cluster management.
- Implement patches and upgrades to Linux, Slurm and OpenHPC as required.
- Install new servers and storage, build new clusters, configure and manage Linux distributions, hypervisors (KVM) and tooling.
- Automate where possible to increase efficiency of operations.
- Execute upon firewall access requests to the environment.
- Escalate priority support issues to senior staff and/or other corporate technology groups
- Collect and document all relevant information prior to escalation to allow senior staff to operate efficiently
- Document, track and monitor problems to ensure timely resolution.
- Assist in tracking helpdesk calls pertaining to application, networking, and systems problems and issues.
- Assign username, password and access right permissions for multiple proprietary applications, as well as client software.
- Identity Management and multifactor authentication with integration between Active Directory and Linux platforms.
- Perform hardware & software audits.
- Product research and evaluation.
- Provide emergency support on incidents as required.
- Perform occasional after-hours maintenance.
- Incident on-call rotation as required.
- Day-to-day operational support.
Job Description
Main Responsibilities
- Identify, diagnose, and resolve level two problems for users of the software and hardware, LAN and WAN, VPN, the Internet, mobile devices, and new computer technology; communicate solutions to end-users.
- Respond to more complex issues (second line support) escalated by the first line support using problem-solving skills and analysis to identify root causes of issues, determine course of action and propose creative solutions.
- Manage day-day operations and support of the HPC environment (Linux).
- Take ownership of capacity, availability and performance of the HPC cluster(s).
- Support end users in the submission and management of jobs based on Slurm and OpenHPC.
- Migrate existing nodes as required to Linux.
- Implement and manage a system based on Foreman or similar to manage patching and oversee cluster management.
- Implement patches and upgrades to Linux, Slurm and OpenHPC as required.
- Install new servers and storage, build new clusters, configure and manage Linux distributions, hypervisors (KVM) and tooling.
- Automate where possible to increase efficiency of operations.
- Execute upon firewall access requests to the environment.
- Escalate priority support issues to senior staff and/or other corporate technology groups
- Collect and document all relevant information prior to escalation to allow senior staff to operate efficiently
- Document, track and monitor problems to ensure timely resolution.
- Assist in tracking helpdesk calls pertaining to application, networking, and systems problems and issues.
- Assign username, password and access right permissions for multiple proprietary applications, as well as client software.
- Identity Management and multifactor authentication with integration between Active Directory and Linux platforms.
- Perform hardware & software audits.
- Product research and evaluation.
- Provide emergency support on incidents as required.
- Perform occasional after-hours maintenance.
- Incident on-call rotation as required.
- Day-to-day operational support.
Specialized Skills, Knowledge & Abilities
- In-depth and demonstrated experience in the installation and operation of Linux platforms in an Enterprise environment (Ubuntu/RedHat).
- Experience in the use of KVM or other hypervisors.
- Experience in HPC tools such as Slurm, LSF or GridEngine.
- Demonstrated knowledge of HPC clusters and use cases.
- Working technical knowledge of network systems.
- Working technical knowledge of current systems software, protocols and standards including Active Directory.
- Identity management using Microsoft Identity Manager and Azure AD Connect.
- Solid understanding of the Windows based endpoints.
- Solid scripting experience (e.g. Bash)
- Excellent written and oral communication skills.
- Excellent problem-solving skills.
- Strong analytical and troubleshooting skills
- Strong interpersonal and organizational skills.
- Must be well organized and able to grasp system concepts and communicate their applications.
- Must be capable of quickly learning new systems and associated software applications for proficient execution of tasks.
- Ability to manage multiple demands with time related constraints in a fast-paced environment.
- Prioritize and schedule work as necessary to maintain department standards and service level agreements
- Ability to speak effectively before groups of internal employees, communicate technical information, create and deliver presentations and information sessions to both technical and nontechnical personnel.
- Demonstrated experience in applying technical expertise and in-depth evaluation to solve complex problems in own area of expertise.
- Ability to create and maintain documentation and training materials, including KB articles, for technical staff and end-user audiences.
- Microsoft Windows experience is an asset.
- Bilingualism (English/French) is an asset.
Seniority level
Seniority level
Entry level
Employment type
Job function
Job function
Information TechnologyIndustries
IT Services and IT Consulting
Referrals increase your chances of interviewing at J&M Group by 2x
Get notified about new System Administrator jobs in Ottawa, Ontario, Canada.
Information Technology and Operations Administrator
System Administrator – VMware & Automation Specialist (32373)
High Performance Computing HPC Administrator (32447)
Identity and Access Management Specialist
ServiceNow Functional/Technical Consultant - Elevate Program 2025
ServiceNow Functional Consultant and Technical Consultant
ServiceNow Functional and Technical Senior Consultant
Technical Integration Specialist - cCure & Genetec Experience
Gatineau, Quebec, Canada CA$56,082.00-CA$108,008.00 4 days ago
Proposal Library Administrator - SharePoint & AI Integration
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.