Enable job alerts via email!

Researcher - Reinforcement Learning

Huawei Technologies Canada Co., Ltd.

Edmonton

On-site

CAD 80,000 - 120,000

Full time

30+ days ago

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading technology firm in Canada seeks a Reinforcement Learning Researcher to advance research in artificial intelligence. The ideal candidate will hold a PhD in Computer Science or a related field and have a strong foundation in deep learning and reinforcement learning. Responsibilities include implementing training pipelines for large language models and publishing findings in top-tier venues. This is a 12-month contract with a focus on innovative AI solutions.

Qualifications

PhD degree in Computer Science or related fields or master’s degree with comparable experience.
Strong foundation in deep learning, including architectures such as Transformers.
Practical or research experience in reinforcement learning or language model fine‑tuning.

Responsibilities

Enable LLMs to learn from experience and feedback.
Implement and evaluate training and evaluation pipelines for LLMs.
Contribute to scientific insights and publications.

Skills

Deep learning fundamentals

Reinforcement learning

Python programming

Communication skills

Education

PhD in Computer Science or related fields

Tools

PyTorch

DeepSpeed

Megatron

Huawei Canada has an immediate 12-month contract opening for a Reinforcement Learning Researcher.

About the team:

Founded in 2012, the Noah’s Ark lab has evolved into a prominent research organization with notable achievements in academia and industry. The lab’s mission focuses on advancing artificial intelligence and related fields to benefit the company and society. Driven by impactful, long‑term projects, the aim is to enhance state‑of‑the‑art research while integrating innovations into the company’s products and services, including LLMs, RL, NLP, computer vision, AI theory, and Autonomous driving.

About the job:

Enabling Large Language Models (LLMs) to learn from experience, interaction, and environment feedback, moving beyond static fine‑tuning toward continual, agentic self‑improvement.
LLM post‑training paradigms (e.g., RLHF, GRPO, reward‑free methods, etc.).
Agentic reinforcement learning for tool‑using and browsing‑based LLMs trained in interactive environments.
Agentic evaluation and benchmarking, including design of multi‑turn, verifiable reasoning tasks.
Your work will involve implementing and evaluating new training and evaluation pipelines for reasoning‑enhanced LLMs and tool‑using agents, scaling experiments on large GPU clusters, and contributing to scientific insights and publications in this emerging area.

About the ideal candidate:

PhD degree in Computer Science or related fields or master’s degree with comparable experience.
Strong foundation in deep learning, including architectures such as Transformers and optimization techniques for large models.
Practical or research experience in reinforcement learning, self‑supervised learning, or language model fine‑tuning.
Proven research record in AI by having at least one paper as the first author in top tier venues, such as NeurIPS, ICML, ICLR, CVPR, ICCV, ECCV, ICRA.
Solid proficiency in Python and experience with PyTorch, DeepSpeed, Megatron and other distributed training frameworks.
Familiarity with LLM post‑training pipelines (RLHF, GRPO/PPO, SFT, LoRA, MoE, etc.) is an asset.
Experience with multi‑agent RL, tool‑use / browser/coding agents, is an asset.
Strong communication and writing skills; enthusiasm for open research and collaborative problem‑solving.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Top locations

Top companies

Top positions