Job Search and Career Advice Platform

Enable job alerts via email!

HPC System Administrator, System, NSCC

A*STAR RESEARCH ENTITIES

Singapore

On-site

SGD 60,000 - 80,000

Full time

Today
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading research organization in Singapore is seeking an HPC System Administrator to manage and ensure the stability of HPC systems. The role involves system operations, user and job management, incident response, and security compliance. Candidates should have a degree in Computer Science or related fields, with a minimum of 2 years of relevant Linux administration experience. Proficiency in scripting and cluster management tools is preferable. Strong troubleshooting and communication skills are essential.

Qualifications

  • Minimum 2 years of experience in Linux system administration, preferably in HPC environments.
  • Basic understanding of RDMA interconnects (Infiniband, RoCE) and parallel file systems (Lustre, GPFS, BeeGFS).
  • Understanding of basic network protocols like DHCP, DNS, TFTP, SMTP.

Responsibilities

  • Administer HPC compute nodes, storage systems, and internal networks.
  • Monitor system health using tools like Grafana, Prometheus, and custom scripts.
  • Respond to system alerts and user-reported issues.

Skills

Linux system administration
Scripting (Python, Bash)
Cluster management tools (xCAT, BCM, HPCM)
Job schedulers (PBS Pro, Slurm)
Troubleshooting skills
Communication skills

Education

Degree in Computer Science, Engineering, IT or related field
Job description
Job Summary

The HPC System Administrator will manage day-to-day operations of HPC systems, ensuring stability, security, and performance. This role includes system monitoring, patching, user account management, job queue oversight, and incident resolution to support NSCC’s supercomputing environment.

Roles and Responsibilities
System Operations & Maintenance
  • Administer HPC compute nodes, storage systems, and internal networks.
  • Monitor system health using tools like Grafana, Prometheus, and custom scripts.
  • Apply patches, updates, and configuration changes to ensure stability.
2. User & Job Management
  • Manage user accounts, access controls, and authentication mechanisms.
  • Monitor job queues and assist users with job submission and scheduling issues.
  • Implement and enforce resource allocation policies
3. Incident Response & Troubleshooting
  • Respond to system alerts and user-reported issues.
  • Document incidents, resolutions, and preventive measures.
  • Collaborate with engineers for escalated issues
4. Security & Compliance
  • Perform regular security checks and vulnerability assessments.
  • Ensure compliance with organizational and regulatory security policies.
5. Documentation & Reporting
  • Maintain system operation logs and configurationdocumentation.
  • Generate reports on system usage, performance, and incidents
Qualifications
  • Degree in Computer Science, Engineering, IT or related field.
  • Minimum 2 years of experience in Linux system administration, preferably in HPC environments.
  • Familiarity with cluster management tools (xCAT, BCM, HPCM).
  • Experience with job schedulers (PBS Pro, Slurm).
  • Basic understanding of RDMA interconnects (Infiniband, RoCE) and parallel file systems (Lustre, GPFS, BeeGFS).
  • Understanding of basic network protocols like DHCP, DNS, TFTP, SMTP, etc
  • Proficient in scripting (Python, Bash).
  • Strong troubleshooting and communication skills.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.