Job Search and Career Advice Platform

Enable job alerts via email!

HPC System Administrator, System, NSCC

Agency for Science, Technology and Research (A*STAR)

Singapore

On-site

SGD 60,000 - 80,000

Full time

Today
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading research agency in Singapore is looking for an HPC System Administrator to oversee the day-to-day operations of HPC systems. The role includes monitoring system performance, managing user accounts, and responding to incidents. Candidates should possess a degree in Computer Science or related fields and have at least 2 years of Linux administration experience in HPC environments. Proficiency in scripting and cluster management tools is essential. This position offers a dynamic work environment with a focus on innovation.

Qualifications

  • Minimum 2 years of experience in Linux system administration, preferably in HPC environments.
  • Familiarity with RDMA interconnects (Infiniband, RoCE) and parallel file systems (Lustre, GPFS, BeeGFS).
  • Understanding of basic network protocols like DHCP, DNS, TFTP, SMTP.

Responsibilities

  • Administer HPC compute nodes, storage systems, and internal networks.
  • Monitor system health and apply patches, updates, and configuration changes.
  • Manage user accounts and job queues.

Skills

Linux system administration
Scripting (Python, Bash)
Troubleshooting skills
Cluster management tools (xCAT, BCM, HPCM)
Job scheduling (PBS Pro, Slurm)

Education

Degree in Computer Science, Engineering, IT or related field

Tools

Grafana
Prometheus
Job description
Job Summary

The HPC System Administrator will manage day-to-day operations of HPC systems, ensuring stability, security, and performance. This role includes system monitoring, patching, user account management, job queue oversight, and incident resolution to support NSCC’s supercomputing environment.

Roles and Responsibilities
  • System Operations & Maintenance
    • Administer HPC compute nodes, storage systems, and internal networks.
    • Monitor system health using tools like Grafana, Prometheus, and custom scripts.
    • Apply patches, updates, and configuration changes to ensure stability.
  • User & Job Management
    • Manage user accounts, access controls, and authentication mechanisms.
    • Monitor job queues and assist users with job submission and scheduling issues.
    • Implement and enforce resource allocation policies.
  • Incident Response & Troubleshooting
    • Respond to system alerts and user-reported issues.
    • Document incidents, resolutions, and preventive measures.
    • Collaborate with engineers for escalated issues.
  • Security & Compliance
    • Perform regular security checks and vulnerability assessments.
    • Ensure compliance with organizational and regulatory security policies.
  • Documentation & Reporting
    • Maintain system operation logs and configuration documentation.
    • Generate reports on system usage, performance, and incidents.
Qualifications
  • Degree in Computer Science, Engineering, IT or related field.
  • Minimum 2 years of experience in Linux system administration, preferably in HPC environments.
  • Familiarity with cluster management tools (xCAT, BCM, HPCM).
  • Experience with job schedulers (PBS Pro, Slurm).
  • Basic understanding of RDMA interconnects (Infiniband, RoCE) and parallel file systems (Lustre, GPFS, BeeGFS).
  • Understanding of basic network protocols like DHCP, DNS, TFTP, SMTP, etc.
  • Proficient in scripting (Python, Bash).
  • Strong troubleshooting and communication skills.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.