Enable job alerts via email!

High Performance Computing Engineer (HPC Engineer)

ZipRecruiter

Cambridge

Hybrid

GBP 50,000 - 90,000

Full time

12 days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An established industry player is seeking a skilled HPC Engineer to join their Infrastructure Engineering team. This role focuses on managing and evolving high-performance computing environments while driving innovation through automation and cloud technologies. You will ensure the performance and scalability of HPC systems, working closely with engineering teams to optimize compute-intensive workloads. Join a global team dedicated to solving complex challenges in a collaborative environment, offering a hybrid working model that promotes work-life balance and flexibility.

Benefits

Hybrid working model
Work-life balance
Cutting-edge HPC infrastructure
Global team collaboration

Qualifications

  • 4-8 years of experience in HPC system administration or engineering.
  • Strong Linux system administration skills, RHCE certification is a plus.
  • Hands-on experience with HPC job schedulers like IBM Spectrum LSF.

Responsibilities

  • Operate and support HPC clusters, including job scheduling and performance tuning.
  • Automate administrative tasks using Shell, Bash, or Python scripting.
  • Collaborate with DevOps teams to integrate HPC solutions with cloud platforms.

Skills

HPC system administration
Linux (RedHat)
Shell scripting
Bash scripting
Python scripting
HPC job schedulers
Cloud platforms (AWS, GCP)
Infrastructure monitoring tools
Vulnerability management

Education

Degree in Computer Science
Degree in Engineering

Tools

IBM Spectrum LSF
SLURM
Terraform
Ansible
FlexLM

Job description

Job Description

About the Role:

We are seeking a skilled and passionate HPC Engineer to join our Infrastructure Engineering team. This role focuses on managing, supporting, and evolving our High Performance Computing (HPC) environment while driving innovation through automation and modern cloud technologies.

You’ll play a key role in ensuring the performance, availability, and scalability of HPC systems used by engineering teams across the organization. From managing workload schedulers to enhancing security and performance tuning, you will be at the heart of our mission to deliver world-class compute infrastructure.

Key Responsibilities:

  • Operate and support HPC clusters, including job scheduling, resource management, and performance tuning.
  • Work closely with engineering teams to ensure optimal usage of HPC systems for compute-intensive workloads.
  • Manage and support HPC workload management tools, such as IBM Spectrum LSF.
  • Automate common administrative and maintenance tasks using Shell, Bash, or Python scripting.
  • Ensure the HPC environment remains secure, patched, and compliant with evolving security standards.
  • Support HPC-related licensing tools (e.g., FlexLM, EDA licenses).
  • Monitor system performance, availability, and proactively resolve bottlenecks and failures.
  • Provide technical support to engineers and researchers using the HPC platform.
  • Collaborate with infrastructure and DevOps teams to integrate HPC solutions with cloud platforms (e.g., AWS, GCP).
  • Document architecture, configurations, and procedures for knowledge sharing and operational transparency.

Required Skills & Qualifications:

  • 4–8 years of experience in HPC system administration or engineering.
  • Degree in Computer Science, Engineering, or a related technical field.
  • Strong Linux (RedHat ) system administration skills – RHCE certification is a plus.
  • In-depth knowledge of HPC infrastructure, including cluster management and optimization.
  • Hands-on experience with HPC job schedulers such as IBM Spectrum LSF, SLURM, or similar.
  • Strong scripting skills in Shell, Bash, Python, or Perl.
  • Experience with cloud platforms (AWS, GCP, Azure) is a plus.
  • Familiarity with tools for infrastructure monitoring, remote access, and interactive technologies (e.g., ETX).
  • Understanding of vulnerability management, security patching, and system hardening.
  • Experience working in global, distributed teams and supporting technical end users.

Nice to Have:

  • Experience with Infrastructure as Code tools like Terraform and Ansible.
  • Familiarity with Continuous Integration (CI) tools and artifact management (e.g., Artifactory).
  • Exposure to EDA tools and software license management systems.

Why Join Us?

  • Take ownership of cutting-edge HPC infrastructure powering world-class engineering workloads.
  • Be part of a global team solving complex performance and scalability challenges.
  • Drive innovation and automation in a technically rich and collaborative environment.
  • Hybrid working model with flexibility and work-life balance.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Scala/Spark Optimization Engineer (HPC)

TN United Kingdom

Remote

GBP 50.000 - 90.000

24 days ago