Enable job alerts via email!

Sr Systems Engineer HPC

Rackspace

United States

Remote

USD 90,000 - 150,000

Full time

30+ days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An innovative firm is seeking a highly skilled HPC System Engineer to join their remote team. In this role, you will design, implement, and optimize high-performance computing infrastructures, working closely with researchers and engineers. You will be responsible for managing HPC clusters, ensuring efficient operation, and troubleshooting issues. This position offers a unique opportunity to contribute to cutting-edge projects while collaborating with top professionals in the field. If you are passionate about technology and eager to make a significant impact, this role is perfect for you.

Qualifications

  • 10+ years experience in systems, with 5 years in HPC environments.
  • Strong knowledge of Linux systems and cluster management tools.

Responsibilities

  • Install and maintain HPC clusters, optimizing performance and reliability.
  • Collaborate with researchers to meet computational needs and provide support.

Skills

Linux operating systems
HPC experience
Scripting languages (R, Python, Bash)
Communication skills
Cluster management tools (Slurm, PBS)
Configuration management software (Terraform, Ansible)
Data transfer protocols

Education

Bachelor's degree in computer science, engineering, or related field

Tools

Slurm
PBS
Terraform
Ansible

Job description

Job Summary: Rackspace seeking a highly skilled and motivated HPC System Engineer to join our team. You’ll be responsible for working directly for one of flagship clients and designing, implementing, maintaining, and optimizing their high-performance computing (HPC) infrastructure. You will work closely with researchers, scientists, and other engineers to ensure the efficient and reliable operation of the HPC systems.

Work Location: 100% Remote. Due to this role supporting a customer in the Seattle area we prefer to hire in either PST or CST time zones.

Travel: There may be minimal travel to either San Antonio, TX or Seattle WA.

Responsibilities:

  • Install, configure, and maintain HPC clusters, including hardware and software components.
  • Monitor system performance, identify bottlenecks, and implement solutions to optimize performance.
  • Manage user accounts, permissions, and resource allocation.
  • Perform regular system maintenance, updates, and patching.
  • Troubleshoot and resolve hardware and software issues in a timely manner.
  • Participate in the design and planning of HPC infrastructure upgrades and expansions.
  • Evaluate and recommend hardware and software solutions to meet evolving computational needs.
  • Implement and manage storage systems, networking infrastructure, and interconnects (e.g., InfiniBand).
  • Optimize system configurations and application performance for HPC workloads.
  • Profile and analyze application performance to identify areas for improvement.
  • Implement and utilize performance monitoring tools and techniques.
  • Provide technical support and training to HPC users.
  • Collaborate with researchers and scientists to understand their computational requirements.
  • Work closely with HPC architects and engineers to ensure that research needs are met.
  • Document system configurations, procedures, and best practices.
  • Assist HPC engineers and architects with day-to-day operations and ticket management.
  • Implement and maintain security measures to protect HPC infrastructure and data.
  • Ensure compliance with relevant security policies and regulations.
  • Manage data backups and disaster recovery procedures.

Qualifications:

  • Bachelor's degree in computer science, engineering, or a related field. Experience may substitute for the degree.
  • Minimum of 10 yrs experience working with systems; 5yrs specifically with HPC.
  • Strong knowledge of Linux operating systems (e.g., Rocky, Ubuntu).
  • Experience with cluster management tools (e.g., Slurm, PBS).
  • Familiarity with high-speed interconnects (e.g., InfiniBand, Ethernet).
  • Knowledge of parallel file systems (e.g., Lustre, SEPH, GPFS).
  • Proficiency in scripting languages (e.g., R, Python, Bash).
  • Understanding of HPC hardware architectures and technologies (e.g., CPUs, GPUs, memory).
  • Strong demonstrated experience with a major configuration management software (e.g. Terraform, Ansible), including application packaging and installation.
  • Must have strong knowledge of Linux security and Linux shell scripting.
  • Strong communication and interpersonal skills.
  • Knowledge of data transfer protocols and large-scale storage solutions.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Sr Systems Engineer HPC

Rackspace, Inc.

Remote

USD 135,000 - 199,000

6 days ago
Be an early applicant

Sr Systems Engineer - Remote

Altera Digital Health Inc. United States

Remote

USD 90,000 - 110,000

6 days ago
Be an early applicant

Sr Systems Engineer HPC

Rackspace Technology

Remote

USD 116,000 - 199,000

21 days ago

Engineer, Systems Expert

Minnesota Ag Connection

Rosemount

Remote

USD 127,000 - 168,000

Today
Be an early applicant

Senior Cloud Services Systems Engineer

CACI International

Remote

USD 95,000 - 211,000

4 days ago
Be an early applicant

Sr Systems Engineer HPC

Rackspace Technology

Remote

USD 90,000 - 150,000

30+ days ago

Windows Packager / Automation – System Engineer

PLANIT Group

Raleigh

Remote

USD 95,000 - 125,000

Yesterday
Be an early applicant

Sr. Power Systems Engineer

Insight Global

Remote

USD 140,000 - 170,000

16 days ago

Senior Software Engineer - Platform

BetterComp

Remote

USD 140,000 - 180,000

6 days ago
Be an early applicant