Enable job alerts via email!

Principal Architect - Large Model and Training System Performance Optimization

Huawei

Vancouver

On-site

USD 121,000 - 230,000

Full time

30+ days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An innovative firm is seeking a Principal Architect to lead the design of cutting-edge AI training systems. In this pivotal role, you will drive architectural competitiveness and collaborate with global teams to enhance algorithm performance. The ideal candidate will possess a Master's or PhD in Computer Science or Math/Statistics and have extensive experience in architecting large-scale AI solutions. This role offers an exciting opportunity to shape the future of AI technologies and contribute to groundbreaking projects in a dynamic environment, making a significant impact on the industry. If you are passionate about AI and eager to tackle complex challenges, this position is perfect for you.

Qualifications

  • 5+ years of experience in architecting large-scale AI training systems.
  • Master’s or PhD in Computer Science or Math/Statistics required.

Responsibilities

  • Lead architecture design of Ascend training products and drive innovation.
  • Collaborate with global teams to align on strategic goals.

Skills

AI & Deep Learning
Architecting large-scale AI training systems
C/C++ programming
Python programming
Documentation skills
Communication skills
Performance optimization
Deep Reinforcement Learning

Education

Master’s or PhD in Computer Science
Master’s or PhD in Math/Statistics

Tools

veRL
Ray
Nsight Systems
Nsight Compute
DLProf
PyTorch
Megatron
DeepSpeed

Job description

Huawei Canada has an immediate permanent opening for a Principal Architect

About the team:

The Computing Data Application Acceleration Lab aims to create a leading global data analytics platform organized into three specialized teams using innovative programming technologies. This team focuses on full-stack innovations, including software-hardware co-design and optimizing data efficiency at both the storage and runtime layers. This team also develops next-generation GPU architecture for gaming, cloud rendering, VR/AR, and Metaverse applications.

One of the goals of this lab is to enhance algorithm performance and training efficiency across industries, fostering long-term competitiveness.

About the job:

  • Lead the architecture design of Ascend training products, driving the continuous evolution of architectural competitiveness.
  • Analyze mainstream scenario requirements and industry technology trends for Ascend, introducing innovative technologies to ensure sustained leadership in architectural competitiveness.
  • Identify requirements for MindX, AI frameworks, acceleration libraries, and chip hardware, building a robust software-hardware architecture for Ascend training to achieve ongoing commercial success.
  • Collaborate with other departments/teams from Huawei’s global research centers to align on strategic goals.
  • Spearhead project planning and define the technology/product development roadmap to guide long-term innovation.

The base salary for this position ranges from $121,000 to $230,000 depending on education, experience and demonstrated expertise.


About the ideal candidate:

  • Master’s or PhD in Computer Science, Math/Statistics, with a focus on AI & Deep Learning.
  • 5+ years of experience in architecting large-scale AI training systems or similar complex software-hardware integrated solutions.
  • Excellent documentation skills for writing internal reports and/or publishing research papers. Effective communication skills for presentations to internal and external audiences. A proactive attitude with a strong ability to tackle challenges and adapt to evolving requirements and dynamic work environment.
  • Working knowledge of AI accelerators or full-stack AI acceleration systems and Deep Reinforcement Learning.
  • Hands-on experience with veRL or Ray for large-scale model training.
  • Familiarity with processor architectures and relevant work experience, with hands-on expertise in designing and developing complex system software architectures, and experience in performance optimization on GPU/NPU or similar hardware platforms.
  • Solid understanding of deep learning fundamentals, proficiency with the PyTorch framework, and practical experience in performance optimization using upper-layer distributed frameworks such as Megatron or DeepSpeed.
  • Strong programming skills with proficiency in C/C++ and Python.
  • Experience using performance analysis tools such as Nsight Systems, Nsight Compute, and DLProf.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Principal Architect - Large Model and Training System Performance Optimization

Huawei Technologies Canada Co., Ltd.

Vancouver

On-site

USD 121,000 - 230,000

30+ days ago