Job Search and Career Advice Platform

Ativa os alertas de emprego por e-mail!

Senior Hpc Cluster Support Engineer (Bright Cluster Manager + Slurm)

Sky Systems, Inc. (Skysys)

Manaus

Teletrabalho

BRL 120.000 - 160.000

Tempo parcial

Hoje
Torna-te num dos primeiros candidatos

Cria um currículo personalizado em poucos minutos

Consegue uma entrevista e ganha mais. Sabe mais

Resumo da oferta

A leading technology firm is seeking a Senior HPC Cluster Support Engineer to maintain and support large-scale production HPC environments. This part-time role (20 hours/week) is entirely remote and requires strong experience with Bright Cluster Manager and Linux systems. Responsibilities include managing cluster operations, troubleshooting issues, and performing hardware diagnostics using remote management tools. Ideal candidates will possess skills in InfiniBand and HPC storage systems, ensuring smooth operation and user support.

Qualificações

  • Strong experience in managing large-scale HPC environments.
  • Proficient in troubleshooting and resolving cluster-related issues.
  • Knowledge of remote management tools for hardware diagnostics.

Responsabilidades

  • Manage cluster operations and job submissions.
  • Resolve node failures and networking issues.
  • Diagnose hardware faults and perform remote checks.

Conhecimentos

Strong experience with Bright Cluster Manager
Linux systems administration
Hardware diagnostics
Experience with InfiniBand
Active Directory integration for Linux

Ferramentas

Dell iDRAC
HPE iLOM
Supermicro BMC tools
Descrição da oferta de emprego
Role

HPC Cluster Support – CIBA 4 (Senior)

Position Type

Part-Time Contract (20 hrs / week)

Contract Duration

6 months

Work Hours

EST or PST

Location

100% Remote

Overview

We’re seeking a Senior HPC Cluster Support Engineer to maintain and support large-scale production HPC environments running Bright Cluster Manager and Slurm.

Key Responsibilities
  • Manage cluster operations, job submission issues, queue management, and user troubleshooting.
  • Monitor cluster health and resolve node failures, networking issues, and domain problems.
  • Diagnose hardware faults (GPUs, boards, power, nodes) and perform remote checks using BMC tools (Dell iDRAC, HPE iLOM, Supermicro).
  • Troubleshoot InfiniBand, Panasas storage, and network integration issues.
  • Coordinate repairs and elevate with vendors (ParkPlace, VDura).
  • Apply system updates, patches, and configurations.
  • Collaborate with users and provide regular status updates.
Required Skills
  • Strong experience with Bright Cluster Manager and Slurm.
  • Linux systems administration and advanced troubleshooting.
  • Hardware diagnostics and BMC remote management tools.
  • Experience with InfiniBand, HPC storage systems (Panasas), and vendor escalation.
  • Active Directory integration for Linux is a plus.
Obtém a tua avaliação gratuita e confidencial do currículo.
ou arrasta um ficheiro em formato PDF, DOC, DOCX, ODT ou PAGES até 5 MB.