About the Role:
We are seeking an experienced Architecture Expert to lead the design, strategy, and evolution of our High-Performance Computing (HPC) environment. This role involves defining the architectural vision for our HPC infrastructure, guiding the selection and implementation of cutting-edge technologies (compute, storage, interconnects, schedulers), and ensuring the platform effectively supports demanding computational research, simulations, and data analysis workloads relevant to our software and industrial applications.
Responsibilities:
- Lead the architectural design, planning, and implementation of scalable, reliable, and efficient HPC systems.
- Develop and maintain the strategic roadmap for HPC infrastructure, including hardware, software, storage, and networking components.
- Provide technical leadership and mentorship to HPC architects, engineers, and administrators.
- Evaluate emerging HPC technologies and trends, making recommendations for adoption.
- Collaborate with researchers, data scientists, software engineers, and application owners to understand computational requirements and design appropriate HPC solutions.
- Define standards and best practices for HPC system configuration, job scheduling, resource management, and performance optimization.
- Ensure the security and integrity of the HPC environment.
- Oversee architectural aspects of system upgrades, expansions, and technology refreshes.
- Work with vendors to evaluate and select HPC hardware and software components.
- Contribute to capacity planning and performance tuning efforts.
Qualifications:
Minimum Qualifications:
- Preferred PhD/Master degree or Bachelor's degree in Computer Science, Engineering, Physics, or a related computational field.
- 8+ years of experience working with High-Performance Computing (HPC) systems.
- 3+ years of experience in designing, architecting, or leading technical implementations of complex HPC environments.
- Deep understanding of HPC architectures, parallel file systems (e.g., Lustre, GPFS), high-speed interconnects (e.g., InfiniBand, Slingshot), job schedulers (e.g., Slurm, LSF), and MPI/parallel programming concepts.
- Experience leading technical projects or initiatives.
- Experience applying HPC within specific industrial contexts (e.g., Energy, Aerospace, Life Sciences, Finance).
- Experience with cloud-based HPC or hybrid HPC environments.
- Programming skills (e.g., Python, C/C++, Fortran) and experience with scientific computing applications.