Enable job alerts via email!

HPC AI Infrastructure Hardware Manager

KLA-Tencor (Singapore) Pte Ltd

Singapore

On-site

SGD 80,000 - 110,000

Full time

Today
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Start fresh or import an existing resume

Job summary

A leading company in the HPC sector is seeking an experienced professional to manage teams and drive development for high-performance computing platforms. The role requires expertise in Linux systems, hardware design, and collaboration across teams. Success in this position involves mentoring, project execution, and ensuring that HPC solutions meet evolving customer needs while maintaining strong relationships across functions.

Qualifications

  • 3+ years of experience managing and mentoring teams.
  • Experience with HPC technologies and Linux hardware ecosystems.
  • Strong troubleshooting and analytical skills with monitoring tools.

Responsibilities

  • Drive team growth, mentoring members and ensuring project execution.
  • Collaborate with OEMs on optimal HPC solutions and roadmaps.
  • Develop and deploy Embedded HPC infrastructure tailored to business needs.

Skills

Team Management
Linux Systems Administration
HPC infrastructure
Networking
Security
Performance Tuning

Education

Engineering Degree (Computer Science, Computer Engineering)

Tools

Chef
Ansible
Salt
Packer
Kubernetes
Docker
Singularity

Job description

The ideal candidate will have a strong understanding of HPC infrastructure, experience in deriving hardware specifications based requirements, and proficiency in product lifecycle management. They will engage with teams to understand their requirements, drive development for our HPC platforms, and collaborate with other teams for integration. The candidate should also have expertise in Hardware System Design, Linux Systems Administration, container orchestration, networking, security, diagnostics tooling and performance tuning. Experience integrating, testing, and optimizing the integration of HPC with storage and data platforms is also essential.

Principal Responsibilities:

  • Drive team growth and development, providing mentorship and support to team members.

  • Ensure the successful execution of projects, meeting deadlines and delivering high-quality results.

  • Work with various OEMs to understand their Product offerings and Roadmaps to create optimal HPC Solution Offerings.

  • Collaborate with other sub-system teams on developing HPC Cluster Roadmaps that meet Product Requirements.

  • Collaborate within a customer-focused teams to design, develop, test, and deploy Embedded HPC infrastructure in alignment with business needs.

  • Foster strong relationships with Product and Program Management, Software engineering, Mfg and Service teams to ensure the HPC Platforms effectively meet their requirements.

Qualifications/Skills:

  • 3+ years’ experience in managing, and mentoring teams.

  • Knowledge of Linux Hardware Ecosystem centered around CPU, GPU and PCIE Architecture.

  • Deep understanding of Linux Operating systems, Networking with practical experience in tuning HPC workloads.

  • Experience with configuration management and automation tools, such as Chef, Ansible, Salt, Packer

  • Experience with building monitoring and alerting on logs and metrics with excellent troubleshooting and analytical skills.

  • Experience with and a strong understanding of containers (docker/singularity). Container orchestration with Kubernetes a Plus.

  • Maintain a grounded approach, making decisions based on data and strategic goals rather than emotions and clearly articulate the decisions.

  • International traveling couple times a year will be required.

Minimum Qualifications:

  • Engineering degree (Preferably Computer Science, Computer Engineering)

  • Experience working with HPC Technologies.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.