Enable job alerts via email!

Software Engineer - AI System & Infrastructure

Huawei Technologies Canada Co., Ltd.

Burnaby

On-site

CAD 110,000 - 210,000

Full time

16 days ago

Job summary

A leading technology firm in Burnaby is seeking an Engineer to innovate AI infrastructure solutions. Responsibilities include identifying performance issues and initiating projects to enhance scalability. The ideal candidate has a Master/PhD in Computer Science or Engineering and experience in high-performance distributed systems. A competitive salary range of $110,000 to $210,000 is offered based on qualifications.

Qualifications

  • Master/PhD degree in Computer Science, Computer Engineering.
  • Experience in building large scale and high-performance distributed systems.
  • Experience in Nvidia TensorRT and/or Triton servers.
  • Knowledge & experience in distributed system design & development.
  • Work experience in technologies like vLLM, Ray, SGLang, Kubernetes.
  • Experience in programming languages: C/C++, Go, Java, Rust, Python, C#.
  • Excellent interpersonal and communication skills.

Responsibilities

  • Understand AI System and Infrastructure technology landscape.
  • Initiate and charter innovation projects for AI infrastructure platform.
  • Provide scalable and high-performance architecture design.
  • Collaborate with teams to deliver project features improving scalability and performance.

Skills

Building large scale and high-performance distributed systems
Nvidia TensorRT
Container virtualization technologies
Distributed system design & development
Serverless technologies
Programming in C/C++, Go, Java, Rust, Python, C#
Excellent interpersonal and communication skills

Education

Master/PhD degree in Computer Science or Computer Engineering

Tools

Kubernetes
TensorRT-LLM
Pytorch framework
Cuda libraries
GPU technologies
Job description

Huawei Canada has an immediate permanent opening for an Engineer.

About the team:

The Intelligent Cloud Infrastructure Lab aims to innovate technologies, algorithms, systems, and platforms for next-generation cloud infrastructure. The lab addresses scalability, performance, and resource utilization challenges in existing cloud services while preparing for future challenges with appropriate technologies and architectures. Additionally, the lab aims to understand industry dynamics and technology trends to create a robust ecosystem.

About the job:
  • Understand AI System and Infrastructure technology landscape, and identify scalability/performance issues or challenges of current LLM/multi-modal LLM systems
  • Initiate and charter innovation projects to build or re-architect AI infrastructure platform, and plan milestones accordingly
  • Provide/contribute a scalable and high-performance architecture design or re-design for the infrastructure system that is optimized for AI training and inferencing, which includes but not limited to cluster management and scheduling, LLM model deployment, elastic LLM as well as AI container cold/warm start-up optimization, and so on.
  • Collaborate with internal and external teams to deliver the project or project features that improve our overall system scalability and performance.

The base salary for this position ranges from $110,000 to $210,000 depending on education, experience and demonstrated expertise.


About the ideal candidate:
  • Master/PhD degree in Computer Science, Computer Engineering
  • Experience in building large scale and high-performance distributed system
  • Experience in Nvidia TensorRT and/or Triton servers. Experience in container virtualization technologies
  • Knowledge & experience in distributed system design & development, including serverless technologies
  • Work experience in one or more of the following technologies: vLLM, Ray, SGLang, Kubernetes, TensorRT-LLM, Pytorch framework, Cuda libraries, GPU technologies
  • Work experience in one or more of the following programming languages: C/C++, Go, Java, Rust, python, C#.
  • Have excellent interpersonal and communication skills to collaborate with multiple teams and build strong partnerships effectively.
  • Demonstrated success working on software engineering problems that span multiple products
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.