Enable job alerts via email!

Senior Inference ML Engineer — Sparse Attention & Pruning

Cerebras

Canada

Hybrid

CAD 100,000 - 150,000

Full time

Today

Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading AI technology company is seeking a Senior Research Engineer to enhance inference models for its innovative hardware. The ideal candidate will possess advanced skills in Python or C++, along with significant experience in machine learning and AI technologies. This role involves designing and optimizing transformer architectures, leading research on inference algorithms, and collaborating across teams. Applicants must have a strong educational background in computer science, with several years of hands-on experience in the field. This hybrid role can be based in Toronto, ON, or Sunnyvale, CA.

Benefits

Open-source AI research

Job stability with startup vitality

Non-corporate work culture

Qualifications

7+ years of ML software development experience.
Experience testing and maintaining software products for 4+ years.
3+ years of experience in machine learning software development.

Responsibilities

Design and implement transformer architectures for NLP and computer vision on Cerebras hardware.
Research novel inference algorithms and model architectures.
Profile and optimize model code to maximize throughput.

Skills

Programming skills in Python

Programming skills in C++

Experience with Generative AI

Understanding of transformer-based models

Education

Bachelor's degree in Computer Science or related field and 7+ years of experience

Master's degree in Computer Science or related field and 4+ years of experience

PhD in Computer Science or related field with 2+ years of experience

Tools

PyTorch

Transformers

vLLM

SGLang

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.