Job Search and Career Advice Platform

Ativa os alertas de emprego por e-mail!

Senior Hpc Cluster Support Engineer (Bright Cluster Manager + Slurm)

Sky Systems, Inc. (Skysys)

Diadema

Teletrabalho

BRL 120.000 - 160.000

Tempo parcial

Hoje
Torna-te num dos primeiros candidatos

Cria um currículo personalizado em poucos minutos

Consegue uma entrevista e ganha mais. Sabe mais

Resumo da oferta

A technology services company is seeking a Senior HPC Cluster Support Engineer for a part-time remote role in Brazil. This position involves managing and supporting large-scale production HPC environments using Bright Cluster Manager and Slurm. Key responsibilities include troubleshooting cluster operations, diagnosing hardware issues, and coordinating repairs with vendors. Candidates should have strong experience in Linux systems administration, hardware diagnostics, and HPC storage systems. This is a 6-month contract position requiring 20 hours per week.

Qualificações

  • Strong experience with Bright Cluster Manager and Slurm.
  • Linux systems administration and advanced troubleshooting.
  • Hardware diagnostics and BMC remote management tools.

Responsabilidades

  • Manage cluster operations, job submission issues, queue management.
  • Monitor cluster health and resolve node failures and networking issues.
  • Diagnose hardware faults and perform remote checks.

Conhecimentos

Bright Cluster Manager experience
Advanced troubleshooting in Linux
Hardware diagnostics expertise
Experience with InfiniBand
Knowledge of Panasas storage systems
Active Directory integration for Linux

Ferramentas

Dell iDRAC
HPE iLOM
Supermicro BMC tools
Descrição da oferta de emprego
Role

HPC Cluster Support – CIBA 4 (Senior)

Position Type

Part-Time Contract (20 hrs / week)

Contract Duration

6 months

Work Hours

EST or PST

Location

100% Remote

Overview

We’re seeking a Senior HPC Cluster Support Engineer to maintain and support large-scale production HPC environments running Bright Cluster Manager and Slurm.

Key Responsibilities
  • Manage cluster operations, job submission issues, queue management, and user troubleshooting.
  • Monitor cluster health and resolve node failures, networking issues, and domain problems.
  • Diagnose hardware faults (GPUs, boards, power, nodes) and perform remote checks using BMC tools (Dell iDRAC, HPE iLOM, Supermicro).
  • Troubleshoot InfiniBand, Panasas storage, and network integration issues.
  • Coordinate repairs and elevate with vendors (ParkPlace, VDura).
  • Apply system updates, patches, and configurations.
  • Collaborate with users and provide regular status updates.
Required Skills
  • Strong experience with Bright Cluster Manager and Slurm.
  • Linux systems administration and advanced troubleshooting.
  • Hardware diagnostics and BMC remote management tools.
  • Experience with InfiniBand, HPC storage systems (Panasas), and vendor escalation.
  • Active Directory integration for Linux is a plus.
Obtém a tua avaliação gratuita e confidencial do currículo.
ou arrasta um ficheiro em formato PDF, DOC, DOCX, ODT ou PAGES até 5 MB.