Enable job alerts via email!

Research Engineer - MLLM Serving Optimization

Huawei Technologies Canada Co., Ltd.

Burnaby

On-site

CAD 100,000 - 190,000

Full time

24 days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

Huawei Canada seeks a researcher to develop high-performance serving platforms for Machine Learning Language Models (MLLMs) within their Intelligent Cloud Infrastructure Lab. This role involves designing, implementing, and optimizing frameworks, contributing to advancements in AI efficiency, and requires a strong background in computer science and relevant technologies.

Qualifications

  • Bachelor’s degree or higher in Computer Science, ECE, or a related field.
  • Strong expertise in PyTorch and experience with ML frameworks.
  • Familiarity with serverless architectures and cloud computing.

Responsibilities

  • Design, implement, and optimize a high-performance serving platform for MLLMs.
  • Conduct experiments to evaluate and benchmark serving performance.

Skills

Proficiency in PyTorch
Experience with SOTA LLM serving frameworks
Experience with inference optimization for large-scale AI models
Familiarity with distributed systems

Education

Bachelor's degree or higher in Computer Science or ECE

Tools

vLLM
sglang
lmdeploy

Job description

Huawei Canada has an immediate permanent opening for a researcher.

About the team:

The Intelligent Cloud Infrastructure Lab aims to innovate technologies, algorithms, systems, and platforms for next-generation cloud infrastructure. The lab addresses scalability, performance, and resource utilization challenges in existing cloud services while preparing for future challenges with appropriate technologies and architectures. Additionally, the lab aims to understand industry dynamics and technology trends to create a robust ecosystem.

About the job:

  • Design, implement, and optimize a high-performance serving platform for MLLMs.

  • Integrate SOTA open-source serving frameworks such as vLLM, sglang, or lmdeploy.

  • Develop techniques for efficient resource utilization and low-latency inference for MLLMs in serverless environments.

  • Optimize memory usage, scalability, and throughput of the serving platform.

  • Conduct experiments to evaluate and benchmark MLLM serving performance..

  • Contribute novel ideas to improve serving efficiency and publish findings when applicable.

The base salary for this position ranges from $100,000 to $190,000 depending on education, experience and demonstrated expertise


About the ideal candidate:

  • Bachelor’s degree or higher in Computer Science, Electrical and Computer Engineering (ECE), or a related field.

  • Experience with one or more SOTA LLM serving frameworks such as vLLM, sglang, or lmdeploy.

  • Strong proficiency in PyTorch.

  • Familiarity with distributed systems, serverless architectures, and cloud computing platforms.

  • Experience with inference optimization for large-scale AI models.

  • Familiarity with multimodal architectures and serving requirements.

  • Previous experience in deploying AI platforms on cloud services.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.