Overview
The Hewlett Packard Enterprise (HPE) HPC Business Unit has an exciting opportunity for an experienced engineer dedicated to managing and executing complex High-Performance Computing (HPC) upgrades. These upgrades involve operating systems, workload managers, high-speed networks, clustered file systems, and HPC cluster managers. This role is part of our Global Remote Services (GRS) team and focuses on delivering exceptional service while ensuring seamless integration and functionality across the HPC ecosystem. The role works within a collaborative team environment with a designated lead engineer for each project. Team members contribute to project planning, knowledge sharing, coordinating efforts, and ensuring smooth execution.
Responsibilities
- Plan and execute complex HPC upgrade projects, dedicating approximately 80% of the project lifecycle to research, documentation preparation, and technical meetings, ensuring plans are documented completely and technically sound. Manage communication to/from the customer from HPE. The remaining 20% will focus on hands-on implementation.
- Perform detailed technical assessments and design upgrade plans, including OS, workload manager, Slingshot network, and clustered file system upgrades.
- Troubleshoot and resolve integration challenges across a mix of HPC technologies, providing innovative solutions to ensure minimal downtime.
- Collaborate with global teams, including Level 3 SMEs, sales, and onsite engineering personnel, to deliver seamless and customer-focused upgrades.
- Prepare and maintain comprehensive documentation, including pre-upgrade assessments, step-by-step implementation guides, and post-upgrade reports.
- Provide timely and effective communication with customers, ensuring their understanding of the upgrade process, progress, and outcomes.
- Be flexible with working hours to meet project milestones and deadlines, including occasional extended hours and weekend work as needed.
- Act as a technical service advocate, ensuring the upgrades align with the technical and business requirements of customers, with flexibility to accommodate extended working hours when necessary to complete critical project milestones.
- Mainly work remote, with occasional travel for training, installations, or onsite support of HPC systems.
Qualifications
- In-depth Linux knowledge, with proficiency in Red Hat, CentOS, or similar distributions.
- Strong scripting experience with Bash and Python.
- Expertise in clustered file systems such as Lustre, CXFS, GPFS, or StorNext.
- Hands-on experience with high-speed networking (e.g., InfiniBand, Omni-Path) and workload managers (e.g., Slurm, PBS).
- Proven track record in executing complex HPC upgrades across diverse environments.
- Strong analytical and troubleshooting skills, with the ability to isolate and resolve intricate technical issues.
- Excellent organizational and project management skills, with an ability to manage multiple priorities effectively.
- Exceptional verbal and written communication skills in English, with a strong emphasis on customer-focused communication.
- Experience preparing technical documentation and reports for complex IT projects.
- Willingness to work flexible hours to achieve project milestones and provide 24x7 on-call support on a rotating basis.
Additional Desired Skills
- Experience with HPC cluster managers (e.g., Bright Cluster Manager, HPCM).
- Familiarity with Docker, Kubernetes, and RESTful APIs.
- Networking skills, including Ethernet and advanced network tuning.
- Working knowledge of Salesforce or similar CRM tools.
- Previous experience performing software upgrades, patch installations, and hardware repairs.
Requirements
- 8+ years of professional experience and a Bachelor of Arts/Science or equivalent degree in computer science or related area of study; without a degree, three additional years of relevant professional experience (11+ years in total).
Hewlett Packard Enterprise is the global edge-to-cloud company advancing the way people live and work. We help companies connect, protect, analyze, and act on their data and applications wherever they live, from edge to cloud, so they can turn insights into outcomes at the speed required to thrive in today\'s complex world. Our culture thrives on finding new and better ways to accelerate what\'s next. We know varied backgrounds are valued and succeed here. We have the flexibility to manage our work and personal needs. We make bold moves, together, and are a force for good. If you are looking to stretch and grow your career our culture will embrace you. Open up opportunities with HPE.
Benefits
- Competitive salary and comprehensive social benefits.
- A diverse and dynamic work environment.
- Work-life balance support and opportunities for career development.