Enable job alerts via email!

System Administrator (High-Performance Computing - HPC)

SUTD (Singapore University of Technology & Design)

Singapore

On-site

SGD 45,000 - 75,000

Full time

Yesterday
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading university in Singapore seeks an HPC Systems Administrator to design and maintain high-performance computing clusters. This role includes managing resource scheduling and supporting scientific computing workflows, requiring strong Linux skills and a degree in a related field.

Qualifications

  • Bachelor’s degree in Computer Science, Engineering, or related field required.
  • 1+ year experience in HPC systems administration.
  • Proficiency in Linux and shell scripting essential.

Responsibilities

  • Design, deploy, and maintain high-performance computing clusters.
  • Support scientific computing workflows and assist users.
  • Monitor system performance and troubleshoot issues.

Skills

Linux system administration
Shell scripting
Strong communication
Collaboration skills

Education

Bachelor’s degree in Computer Science
Engineering degree

Tools

Docker
Singularity
InfiniBand
Lustre
GPFS
BeeGFS

Job description

Responsibilities:

  • Design, deploy, and maintain high-performance computing clusters.

  • Manage resource scheduling systems (e.g., SLURM - Simple Linux Utility for Resource Management, PBS - Portable Batch System, etc.).

  • Support scientific computing workflows and assist users in optimizing applications.

  • Monitor system performance and troubleshoot hardware/software issues.

  • Provide systems administration / Management systems availability statistics, IT support, systems hardening, systems patching, systems onboarding and decommissioning and other systems related support services.

  • Collaborate with researchers and technical teams on computing needs and solutions.

  • Ensure security, updates, and compliance across GPU (Graphics Processing Unit) cluster infrastructure.

  • Ability to handle emergency situations and proactively resolve any issues.

  • Support any other AI Mega Center tasks as instructed by the supervisor.

Requirements:

  • Minimally a Bachelor’s degree in Computer Science, Engineering, or a related field.

  • At least 1 year of experience in HPC systems administration or engineering.

  • Proficiency with Linux system administration and shell scripting.

  • Experience with parallel computing frameworks and HPC workload managers is advantageous.

  • Familiarity with networking, storage systems, and performance tuning.

  • Experience in managing parallel file systems (Lustre, GPFS (General Parallel File System), BeeGFS - Fraunhofer Parallel Filesystem).

  • Good knowledge of Remote Direct Memory Access-based interconnect (InfiniBand, RoCE - Remote Direct Memory Access over Converged Ethernet).

  • Experience with containerization technologies (e.g., Docker, Singularity) and virtual machine.

  • Knowledge of cloud-based HPC is a plus.

  • Strong communication and collaboration skills.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.