Enable job alerts via email!

Senior HPC Infrastructure Engineer

ZipRecruiter

England

Remote

GBP 60,000 - 80,000

Full time

Yesterday
Be an early applicant

Job summary

A pioneering cloud infrastructure company is seeking an experienced professional to design and deliver high-performance computing (HPC) clusters. This fully remote role involves leading architecture and deployment projects while collaborating with teams and suppliers. The ideal candidate will have deep knowledge of high-speed networking technologies and experience with automation tools like Ansible. Enjoy the benefits of a flexible workplace, share options, and an unlimited holiday policy.

Benefits

Share options
Unlimited holiday policy
100% remote working
Opportunities for development
Collaborative team environment
Enhanced family-friendly policies
Flexible workplace

Qualifications

  • Proven experience managing and tuning HPC job schedulers.
  • Deep knowledge of high-speed networking technologies.
  • Proficiency in using Ansible for automation and configuration management.
  • Strong networking fundamentals, ideally with experience in complex environments.
  • Familiarity with planning and supporting power, cooling, and rack layouts.
  • End-to-end experience deploying and scaling HPC clusters.
  • Understanding of GPU-optimised server hardware and operating systems.
  • Comfortable scripting in Bash, Python, or similar for deployment and maintenance tasks.

Responsibilities

  • Lead architecture and deployment projects for HPC clusters.
  • Work closely with internal teams and external suppliers.
  • Plan hardware and data centre requirements.
  • Configure networks, storage, and compute management software.
  • Support service teams with escalations.
  • Collaborate with software engineers to enhance platform capabilities.
  • Stay up to date with the latest HPC hardware.

Skills

Slurm management and tuning
Infiniband and RoCE knowledge
Ansible proficiency
Strong networking fundamentals
Data centre infrastructure familiarity
Cluster deployment experience
GPU-optimised server architecture understanding
Scripting in Bash or Python

Job description

Job Description

Your new company
I've partnered exclusively with a pioneering company that's shaping the future of cloud infrastructure. Their innovative, high-performance, GPU-optimised platform is driving advancements in AI and HPC, while also championing sustainability for a greener, more efficient world.This role is fully remote, with no expectation to ever be in an office. You'll also enjoy the fantastic perk of unlimited holiday, giving you the freedom to recharge and thrive.

Your new role
This is a hands-on, fully remote role focused on designing and delivering high-performance computing (HPC) clusters. You'll lead end-to-end architecture and deployment projects, working closely with internal teams and external suppliers to build scalable, GPU-optimised environments. From planning hardware and data centre requirements to configuring networks, storage, and compute management software, you'll be at the heart of technical delivery. The role also involves supporting service teams with escalations, collaborating with software engineers to enhance platform capabilities, and staying up to date with the latest in HPC hardware. It's a great opportunity for someone who thrives in project-led infrastructure work and wants to help shape cutting-edge HPC solutions.

What you'll need to succeed

  • Slurm: Proven experience managing and tuning HPC job schedulers.
  • Infiniband and RoCE: Deep knowledge of high-speed networking technologies.
  • Ansible: Proficiency in using Ansible for automation and configuration management.
  • Networking: Strong networking fundamentals, ideally with experience in complex environments.
  • Data Centre Infrastructure: Familiarity with planning and supporting power, cooling, and rack layouts.
  • Cluster Deployment: End-to-end experience deploying and scaling HPC clusters.
  • Server Architecture: Understanding of GPU-optimised server hardware and operating systems.
  • Scripting & Automation: Comfortable Scripting in Bash, Python, or similar for deployment and maintenance tasks.

What you'll get in return

  • Share options.
  • Unlimited holiday policy.
  • 100% Remote working.
  • Fantastic opportunities to develop - they make a habit of promoting in-house.
  • A great team with a passion for working collaboratively.
  • Enhanced family-friendly policies.
  • A truly flexible workplace!

What you need to do now
If you're interested in this role, click 'apply now' to forward an up-to-date copy of your CV, or call us now.

Hays Specialist Recruitment Limited acts as an employment agency for permanent recruitment and employment business for the supply of temporary workers. By applying for this job you accept the T&C's, Privacy Policy and Disclaimers which can be found on our website.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs