Enable job alerts via email!

Software Engineer - AI System & Infrastructure

Huawei Canada

Vancouver

On-site

CAD 78,000 - 168,000

Full time

3 days ago
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An innovative firm is seeking an Engineer for a 12-month contract to enhance AI infrastructure. This role involves identifying challenges in current AI systems, initiating innovation projects, and designing high-performance architectures. The ideal candidate will possess advanced degrees in Computer Science or Engineering, with experience in distributed systems and technologies like Nvidia TensorRT and Kubernetes. This position offers a competitive salary and the opportunity to contribute to cutting-edge cloud infrastructure solutions.

Qualifications

  • Master/PhD degree in Computer Science or Computer Engineering required.
  • Experience in building large scale and high-performance distributed systems.

Responsibilities

  • Identify scalability/performance issues in AI infrastructure.
  • Collaborate with teams to deliver projects enhancing system performance.

Skills

Distributed System Design
High-Performance Computing
Interpersonal Communication
AI Infrastructure Optimization
Problem-Solving

Education

Master's Degree in Computer Science
PhD in Computer Engineering

Tools

Nvidia TensorRT
Triton Servers
Kubernetes
Pytorch Framework
Cuda Libraries

Job description

Huawei Canadahas an immediate 12-month contract opening foran Engineer.

About the team:

The Intelligent Cloud Infrastructure Lab aims to innovate technologies, algorithms, systems, and platforms for next-generation cloud infrastructure. The lab addresses scalability, performance, and resource utilization challenges in existing cloud services while preparing for future challenges with appropriate technologies and architectures. Additionally, the lab aims to understand industry dynamics and technology trends to create a robust ecosystem.

About the job:

  • Understand AI System and Infrastructure technology landscape, and identify scalability/performance issues or challenges of current LLM/multi-modal LLM systems

  • Initiate and charter innovation projects to build or re-architect AI infrastructure platform, and plan milestones accordingly

  • Provide/contribute a scalable and high-performance architecture design or re-design for the infrastructure system that is optimized for AI training and inferencing, which includes but not limited to cluster management and scheduling, LLM model deployment, elastic LLM as well as AI container cold/warm start-up optimization, and so on.

  • Collaborate with internal and external teams to deliver the project or project features that improve our overall system scalability and performance.

The target annual compensation (based on 2080 hours per year) ranges from $78,000 to $168,000 depending on education, experience and demonstrated expertise.


About the ideal candidate:

  • Master/PhD degree in Computer Science, Computer Engineering

  • Experience in building large scale and high-performance distributed system

  • Experience in Nvidia TensorRT and/or Triton servers. Experience in container virtualization technologies

  • Knowledge & experience in distributed system design & development, including serverless technologies

  • Work experience in one or more of the following technologies: vLLM, Ray, SGLang, Kubernetes, TensorRT-LLM, Pytorch framework, Cuda libraries, GPU technologies

  • Work experience in one or more of the following programming languages: C/C++, Go, Java, Rust, python, C#.

  • Have excellent interpersonal and communication skills to collaborate with multiple teams and build strong partnerships effectively.

  • Demonstrated success working on software engineering problems that span multiple products

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Software Engineer - AI System & Infrastructure

Huawei Technologies Canada Co., Ltd.

Vancouver

On-site

CAD 78.000 - 168.000

2 days ago
Be an early applicant

Software Engineer - AI System & Infrastructure

Huawei Canada

Vancouver

On-site

CAD 110.000 - 210.000

2 days ago
Be an early applicant

Co-op Software Engineer - AI System & Infrastructure

Huawei Technologies Canada Co., Ltd.

Vancouver

On-site

CAD 60.000 - 100.000

2 days ago
Be an early applicant

Co-op Software Engineer - AI System & Infrastructure

Huawei Canada

Vancouver

On-site

CAD 60.000 - 100.000

2 days ago
Be an early applicant

Sr Principal Engineer - AI/ML System Architect

Albin Engineering Services, Inc.

Toronto

Remote

CAD 100.000 - 125.000

30+ days ago