Enable job alerts via email!

Senior HPC Systems Engineer (Remote)

RedLine Performance Solutions

California (MO)

Remote

USD 90,000 - 150,000

Full time

30+ days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An established industry player is seeking a Senior HPC Systems Engineer to join a dynamic team supporting NASA's High Performance Computing initiatives. This role involves providing Supercomputing Systems Administration, enhancing batch scheduling systems, and ensuring optimal performance of HPC resources. The ideal candidate will possess extensive experience in HPC environments, strong scripting abilities, and excellent communication skills. Join a forward-thinking company that values innovation and excellence, and play a key role in advancing high-performance computing solutions that make a significant impact in the field.

Qualifications

  • 10+ years of experience in HPC systems administration and Linux systems.
  • Strong scripting skills in Python, Perl, or Bash for automation.

Responsibilities

  • Oversee HPC integrations and develop enhancements to batch scheduling.
  • Provide support to users and resolve HPC system issues.

Skills

HPC Systems Administration
Linux/UNIX User Support
Scripting (Python, Perl, Bash)
System Performance Analysis
Customer Interaction
Software Development
Networking Knowledge
Technical Writing

Education

Bachelor of Science in Computer Science

Tools

PBSPro
Puppet
Ansible
Git

Job description

RedLine Performance Solutions (RedLine) has been in the HPC solutions engineering services business for 25 years and is consistently determined to keep the "bar of excellence" quite high for new hires. This enables RedLine to accomplish what other firms cannot and promotes a high level of staff retention. We offer services ranging from full life cycle HPC systems engineering to remote managed services to HPC program analysis.

We are seeking a Senior HPC Systems Engineer to join our NASA NACS High Performance Computing team at NASA's Ames Research Center in Mountain View, CA. This role primarily provides Supercomputing Systems Administration support for our NASA NACS High Performance Computing (HPC) contract.

U.S. citizenship and the ability to obtain a Public Trust security clearance are mandatory requirements for this position. This position can be remote but will work Pacific time zone business hours. Travel to customer site will be required 2-3 times a year.

An individual at this skill level should have demonstrated extensive experience working with common HPC batch schedulers e.g. (PBS, Slurm, or Moab/Torque) while contributing to the support of users of HPC resources on the various issues they might have getting applications to efficiently execute. This individual should demonstrate experience installing, maintaining, and upgrading HPC systems. The individual, along with the entire HPC team, will be engaged in the day-to-day operations and support of the HPC resources. Activities may include system patching, operating system upgrades, deploying new systems, writing scripts, and troubleshooting system issues on the HPC system. The ability to interact with users to determine symptoms, and then reproduce their issues to isolate root cause of failure is a critical skill for this position. There will also be activities in testing, benchmarking, user tool scripting, and analyzing trouble tickets to find patterns indicating system or user education issues.

Duties and Responsibilities:
  • Oversee and directly contribute to significant ongoing HPC integrations to the environment
  • Design and develop enhancements to the PBSPro batch scheduler based on customer-driven requirements.
  • Apply best practices in system engineering, delivering projects on time, on budget, and with excellent quality
  • Provide support to staff and end users to resolve HPC system issues
  • Mentoring junior staff and cross training peers
  • After hours/weekend support as required
  • Moderate and contribute to Supercomputing System Administration that contributes to:
    • Day-to-day operations of the Linux HPC clusters and storage systems
    • Proactive monitoring, analyze, and correct system issues
    • Development of scripts to automate repetitive tasks or tools to enhance support of the HPC systems
    • System performance analysis and tuning
    • Building, installing, and supporting user-requested software
    • Supporting evaluation and assessment of new HPC technology
    • Resolving user report issues and manage support tickets requests in Remedy
Requirements:
  • Bachelors of Science degree in Computer Science or related field
  • Strong computer science background with in-depth systems-level knowledge in operating systems and networking
  • Solid understanding of the software development process, including requirements, use cases, design, coding, documentation and testing of scalable, distributed applications in a Linux environment
  • A minimum of 10 years of experience with HPC systems administration
  • A minimum of 10 years of experience developing system software in heterogeneous, multi-platform HPC environments
  • Demonstrated equivalence of 10 years of Linux/UNIX user support experience and hands-on experience with administration of Linux systems
  • Experience working with HPC applications and familiarity with at least C, C++, or Fortran
  • Superior scripting skills and excellent attention to detail; proficiency in at least Python, Perl, or Bash
  • Strong ability to interact with customers to understand needs, elicit requirements, and obtain feedback on prototype solutions
  • Excellent communication and people skills; excellent time management and organizational skills
  • Experience with system configuration management tools (e.g., Puppet, Ansible)
  • Experience with revision control software (e.g., Git)
  • Proficiency at technical writing.
Preferred Skills:
  • Experience with Lustre, and InfiniBand
  • Familiarity/proficiency with OpenMP and Message Passing Interface (MPI) programming
  • Experience with cloud technologies (AWS, Azure, GCP), OpenStack or Kubernetes is a plus

To learn more about RedLine, please visit our website at www.RedLinePerf.com

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Power System Engineer

Insight Global

California

Remote

USD 115,000 - 170,000

Today
Be an early applicant

Senior Piping System Design Engineer - Remote or Onsite

Victaulic

Easton

Remote

USD 80,000 - 110,000

Yesterday
Be an early applicant

Sr Systems Engineer HPC

Rackspace Technology

Remote

USD 116,000 - 199,000

Today
Be an early applicant

Senior Piping System Design Engineer - Remote or Onsite

Victaulic Company

Easton

Remote

USD 70,000 - 110,000

6 days ago
Be an early applicant

Sr. IT Systems Engineer - REMOTE

S&S Health

Cincinnati

Remote

USD 70,000 - 110,000

6 days ago
Be an early applicant

Senior System Administrator (Remote)

Lensa

Harrisburg

Remote

USD 130,000 - 150,000

Today
Be an early applicant

Principal Software Engineer - Platform Security / Compliance Architect - (Remote)

New Relic, Inc.

Cincinnati

Remote

USD 120,000 - 160,000

Today
Be an early applicant

Network and Systems Engineer (Remote)

M3 Wake Research, Inc.

Washington

Remote

USD 70,000 - 110,000

6 days ago
Be an early applicant

Sr. IT Systems Engineer - REMOTE

ZipRecruiter

Cincinnati

Remote

USD 80,000 - 110,000

12 days ago