Enable job alerts via email!

Senior Engineer - Large Model and Training System Performance Optimization

Huawei Technologies Canada Co., Ltd.

Burnaby

On-site

CAD 121,000 - 230,000

Full time

30+ days ago

Job summary

A leading tech company in Canada is seeking a Senior Engineer to innovate in AI and deep learning optimization. Responsibilities include researching AI algorithms, publishing findings, and collaborating globally. The ideal candidate holds a Master's or PhD in a relevant field and has 2+ years in optimizing deep learning models. A base salary between $121,000 and $230,000 is offered, based on expertise and experience.

Qualifications

  • 2+ years of working experience in optimizing training deep learning models.
  • Hands-on experience with veRL or Ray for large-scale model training.
  • Familiarity with processor architectures and design complex system software.

Responsibilities

  • Track AI theory and technology trends to produce research reports.
  • Lead research on algorithms for AI model training optimization.
  • Publish AI research papers and attend conferences.
  • Collaborate with global research teams.
  • Assist in project planning and technology road map definition.

Skills

AI & Deep Learning
C/C++
Python
Performance Optimization
Documentation Skills
Communication Skills

Education

Master’s or PhD in Computer Science, Math/Statistics

Tools

PyTorch
Nsight Systems
Nsight Compute
DLProf
Job description

Huawei Canada has an immediate permanentopening for a Senior Engineer.

About the team:

The Computing Data Application Acceleration Lab aims to create a leading global data analytics platform organized into three specialized teams using innovative programming technologies. This team focuses on full-stack innovations, including software-hardware co-design and optimizing data efficiency at both the storage and runtime layers. This team also develops next-generation GPU architecture for gaming, cloud rendering, VR/AR, and Metaverse applications.

One of the goals of this lab are to enhance algorithm performance and training efficiency across industries, fostering long-term competitiveness.

About the job:

  • Track the trend of AI theory and technology development in the world and generate research report and proposals for promoting Ascend system accordingly.

  • Lead or participate in research of algorithms in accelerating the training of the market-driven AI models (CV/NLP/GNN/…), reaching/exceeding the state of the art accuracy, and develop a proof of concept of the algorithms. Those algorithms include but are not limited to the following: optimizers, loss functions, new model architecture, mix precision, model compression, learning technologies (e.g., meta-learning), etc.

  • Publish relevant high-quality AI research papers when necessary and approved, and attend conferences for increasing public awareness of Huawei’s Ascend products; file high-value patents on critical algorithms/processes that are of potential business gain.

  • Team up with other departments/teams from Huawei’s global research centers for collaboration.

  • Assist the team lead on theplanning of projects and definition of technology/products development road map.

The base salary for this position ranges from $121,000 to $230,000 depending on education, experience and demonstrated expertise.


About the ideal candidate:

  • Master’s or PhD in Computer Science, Math/Statistics, with a focus on AI & Deep Learning.

  • 2+ years of working experience in optimizing the performance of training deep learning models and/or their applications in domains such as CV, NLP, or GNN. A proactive attitude with a strong ability to tackle challenges and adapt to evolving requirements and dynamic work environment

  • Excellent documentation skills for writing internal reports and/or publishing research papers. Effective communication skills for presentations to internal and external audiences.

  • Working knowledge of AI accelerators or full-stack AI acceleration systems and Deep Reinforcement Learning.

  • Hands-on experience with veRL or Ray for large-scale model training.

  • Familiarity with processor architectures and relevant work experience, with hands-on expertise in designing and developing complex system software architectures, and experience in performance optimization on GPU/NPU or similar hardware platforms.

  • Solid understanding of deep learning fundamentals, proficiency with the PyTorch framework, and practical experience in performance optimization using upper-layer distributed frameworks such as Megatron or DeepSpeed.

  • Strong programming skills with proficiency in C/C++ and Python.

  • Experience using performance analysis tools such as Nsight Systems, Nsight Compute, and DLProf.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.