Enable job alerts via email!

Senior HPC Support Engineer

JR United Kingdom

Bournemouth

Hybrid

GBP 50,000 - 75,000

Full time

4 days ago
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

NScale is seeking a Senior HPC Support Engineer in Bournemouth to join their innovative team. This hands-on role focuses on optimizing HPC and AI workloads on GPU infrastructure. Candidates must possess strong troubleshooting skills, especially in complex environments, and have a proactive, customer-first mindset. The position offers a chance to excel in a fast-paced, dynamic setting, working closely with customers to enhance their AI performance.

Qualifications

  • Proven experience supporting HPC and/or AI workloads in production environments.
  • Strong expertise with Slurm workload manager, including tuning and troubleshooting.
  • Solid Linux administration skills and troubleshooting experience.

Responsibilities

  • Provide expert-level support for customer HPC and AI workloads running in production.
  • Troubleshoot complex system-level issues across networking, storage, containers, and GPUs.
  • Collaborate with engineering and vendor partners to resolve hardware/software compatibility.

Skills

HPC workloads
AI workloads
System-level debugging
Linux administration
Communication
Problem-solving

Tools

Slurm
NVIDIA GPUs
AMD GPUs
OpenMPI
Ansible
Terraform
Prometheus
Grafana

Job description

Social network you want to login/join with:

col-narrow-left

Client:

Nscale

Location:

bournemouth, United Kingdom

Job Category:

Other

-

EU work permit required:

Yes

col-narrow-right

Job Views:

4

Posted:

31.05.2025

Expiry Date:

15.07.2025

col-wide

Job Description:

Join NScale as a Senior HPC Support Engineer

NScale is the GPU cloud engineered for AI. We provide cost-effective, high-performance infrastructure for AI startups and enterprises. Our platform reduces the complexity of AI development, empowering our customers to achieve faster innovation and better outcomes.

Our mission is to enable the AI breakthroughs of tomorrow by delivering exceptional infrastructure today. At NScale, we’re builders at heart — driven by ownership, innovation, and urgency.

About the Role

We're looking for a Senior HPC Support Engineer to join our fast-growing team, focused on enabling and optimising HPC and AI workloads on GPU-accelerated infrastructure.

You’ll work directly with customers solving some of the most complex problems in AI, helping them troubleshoot and optimize performance in compute-intensive, distributed environments.

This is a hands-on role requiring deep technical acumen, exceptional problem-solving ability, and comfort working across a diverse set of technologies including GPUs (NVIDIA and AMD), InfiniBand networking, and orchestration systems like Slurm.

What You’ll Be Doing

  • Provide expert-level support for customer HPC and AI workloads running in production.
  • Troubleshoot complex system-level issues across networking, storage, containers, and GPUs.
  • Collaborate with engineering and vendor partners to resolve hardware/software compatibility and performance issues.
  • Analyse distributed workloads and assist with tuning of MPI-based applications.
  • Develop internal tools and automation to improve support workflows.
  • Contribute to documentation and knowledge-sharing initiatives.
  • Participate in on-call rotations to support high-priority incidents and escalations.

About You

Skills & Experience

  • Proven experience supporting HPC and/or AI workloads in production environments.
  • Strong expertise with Slurm workload manager, including tuning and troubleshooting.
  • Proficiency with system-level debugging, including kernel modules and network interfaces.
  • Experience with GPU compute platforms (NVIDIA and/or AMD) and associated libraries.
  • Familiarity with MPI libraries (e.g., OpenMPI), InfiniBand, and high-speed Ethernet networking.
  • Solid Linux administration skills and troubleshooting experience.
  • Working knowledge of HPC container runtimes (e.g., Singularity, Apptainer).
  • Exposure to provisioning and automation tools (e.g., Ansible, PXE, Terraform).
  • Experience with monitoring tools such as Prometheus, Grafana, and DCGM.
  • Understanding of GPU/accelerator toolchains like CUDA or ROCm.
  • A proactive, customer-first mindset with strong communication skills.
  • Ability to work effectively in both individual and team settings.
  • Comfort operating in fast-paced, ambiguous, high-growth environments.

Nice to have

  • Experience with OpenStack and troubleshooting infrastructure in cloud environments.
  • Kubernetes expertise, particularly in HPC or AI workload contexts.
  • Familiarity with distributed file systems and advanced storage configurations.
  • Understanding of GPU virtualization and multi-tenant HPC architecture.
  • Exposure to machine learning frameworks and AI optimization workflows.
  • Scripting skills in Python, Bash, or similar for automation and tooling.

Personal Attributes

  • Proactive and self-motivated, with a strong sense of ownership.
  • Thrives in a fast-paced, dynamic, and high-growth environment.
  • Collaborative team player with a passion for delivering outstanding candidate and stakeholder experiences.
  • Strong attention to detail and documentation skills.
  • Excellent communication skills, both written and verbal.
  • A self-starter mindset with a “see a problem, fix a problem” mentality.
  • Experience in designing and implementing processes to optimize deployment workflows.

Please Note: We're currently working remotely, but plan to transition to a hybrid working model in 2025 as we look to secure a modern office space in London.

In all we do, our core values guide us.

Relentless Innovation

Ownership and Accountability

Openness and Transparency

Customer-Centric Focus

Sustainability

At NScale, we are committed to fostering an inclusive, diverse, and equitable workplace. We believe that a variety of perspectives enriches our work environment, and we warmly welcome applications from individuals of all backgrounds, experiences, and perspectives. We strongly encourage applications from people of colour, the LGBTQ+ community, people with disabilities, neurodivergent people, parents, carers, and people from lower socio-economic backgrounds.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Senior HPC Support Engineer

JR United Kingdom

Bath

Remote

GBP 50,000 - 80,000

3 days ago
Be an early applicant

Senior HPC Support Engineer

JR United Kingdom

Southampton

Hybrid

GBP 60,000 - 80,000

3 days ago
Be an early applicant

Senior HPC Support Engineer

JR United Kingdom

Portsmouth

Hybrid

GBP 50,000 - 80,000

3 days ago
Be an early applicant

Technical Support Engineer (Zend Framework / PHP) - Global Software company

JR United Kingdom

Bournemouth

Remote

GBP 40,000 - 60,000

3 days ago
Be an early applicant

Senior WordPress Technical Support Engineer

Levelup

Remote

GBP 60,000 - 80,000

Yesterday
Be an early applicant

Technical Support Engineer (Zend Framework / PHP) - Global Software company

JR United Kingdom

Basingstoke

Remote

GBP 40,000 - 60,000

3 days ago
Be an early applicant

Technical Support Engineer (Zend Framework / PHP) - Global Software company

JR United Kingdom

Portsmouth

Remote

GBP 40,000 - 60,000

3 days ago
Be an early applicant

Product Support Engineer

JR United Kingdom

Portsmouth

Remote

GBP 50,000 - 70,000

3 days ago
Be an early applicant

Product Support Engineer

JR United Kingdom

Basingstoke

Remote

GBP 55,000 - 90,000

3 days ago
Be an early applicant