Enable job alerts via email!

Senior Engineer - Large Model and Training System Performance Optimization

Huawei Technologies Canada Co., Ltd.

Burnaby

On-site

CAD 121,000 - 230,000

Full time

30+ days ago

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading tech company in Canada is seeking a Senior Engineer to innovate in AI and deep learning optimization. Responsibilities include researching AI algorithms, publishing findings, and collaborating globally. The ideal candidate holds a Master's or PhD in a relevant field and has 2+ years in optimizing deep learning models. A base salary between $121,000 and $230,000 is offered, based on expertise and experience.

Qualifications

2+ years of working experience in optimizing training deep learning models.
Hands-on experience with veRL or Ray for large-scale model training.
Familiarity with processor architectures and design complex system software.

Responsibilities

Track AI theory and technology trends to produce research reports.
Lead research on algorithms for AI model training optimization.
Publish AI research papers and attend conferences.
Collaborate with global research teams.
Assist in project planning and technology road map definition.

Skills

AI & Deep Learning

C/C++

Python

Performance Optimization

Documentation Skills

Communication Skills

Education

Master’s or PhD in Computer Science, Math/Statistics

Tools

PyTorch

Nsight Systems

Nsight Compute

DLProf

Huawei Canada has an immediate permanentopening for a Senior Engineer.

About the team:

The Computing Data Application Acceleration Lab aims to create a leading global data analytics platform organized into three specialized teams using innovative programming technologies. This team focuses on full-stack innovations, including software-hardware co-design and optimizing data efficiency at both the storage and runtime layers. This team also develops next-generation GPU architecture for gaming, cloud rendering, VR/AR, and Metaverse applications.

One of the goals of this lab are to enhance algorithm performance and training efficiency across industries, fostering long-term competitiveness.

About the job:

Track the trend of AI theory and technology development in the world and generate research report and proposals for promoting Ascend system accordingly.
Lead or participate in research of algorithms in accelerating the training of the market-driven AI models (CV/NLP/GNN/…), reaching/exceeding the state of the art accuracy, and develop a proof of concept of the algorithms. Those algorithms include but are not limited to the following: optimizers, loss functions, new model architecture, mix precision, model compression, learning technologies (e.g., meta-learning), etc.
Publish relevant high-quality AI research papers when necessary and approved, and attend conferences for increasing public awareness of Huawei’s Ascend products; file high-value patents on critical algorithms/processes that are of potential business gain.
Team up with other departments/teams from Huawei’s global research centers for collaboration.
Assist the team lead on theplanning of projects and definition of technology/products development road map.

The base salary for this position ranges from $121,000 to $230,000 depending on education, experience and demonstrated expertise.

About the ideal candidate：

Master’s or PhD in Computer Science, Math/Statistics, with a focus on AI & Deep Learning.
2+ years of working experience in optimizing the performance of training deep learning models and/or their applications in domains such as CV, NLP, or GNN. A proactive attitude with a strong ability to tackle challenges and adapt to evolving requirements and dynamic work environment
Excellent documentation skills for writing internal reports and/or publishing research papers. Effective communication skills for presentations to internal and external audiences.
Working knowledge of AI accelerators or full-stack AI acceleration systems and Deep Reinforcement Learning.
Hands-on experience with veRL or Ray for large-scale model training.

Familiarity with processor architectures and relevant work experience, with hands-on expertise in designing and developing complex system software architectures, and experience in performance optimization on GPU/NPU or similar hardware platforms.
Solid understanding of deep learning fundamentals, proficiency with the PyTorch framework, and practical experience in performance optimization using upper-layer distributed frameworks such as Megatron or DeepSpeed.
Strong programming skills with proficiency in C/C++ and Python.
Experience using performance analysis tools such as Nsight Systems, Nsight Compute, and DLProf.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Top locations

Top companies

Top positions