Job Search and Career Advice Platform

Enable job alerts via email!

Senior Inference ML Engineer — Sparse Attention & Pruning

Cerebras

Canada

Hybrid

CAD 100,000 - 150,000

Full time

Today
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading AI technology company is seeking a Senior Research Engineer to enhance inference models for its innovative hardware. The ideal candidate will possess advanced skills in Python or C++, along with significant experience in machine learning and AI technologies. This role involves designing and optimizing transformer architectures, leading research on inference algorithms, and collaborating across teams. Applicants must have a strong educational background in computer science, with several years of hands-on experience in the field. This hybrid role can be based in Toronto, ON, or Sunnyvale, CA.

Benefits

Open-source AI research
Job stability with startup vitality
Non-corporate work culture

Qualifications

  • 7+ years of ML software development experience.
  • Experience testing and maintaining software products for 4+ years.
  • 3+ years of experience in machine learning software development.

Responsibilities

  • Design and implement transformer architectures for NLP and computer vision on Cerebras hardware.
  • Research novel inference algorithms and model architectures.
  • Profile and optimize model code to maximize throughput.

Skills

Programming skills in Python
Programming skills in C++
Experience with Generative AI
Understanding of transformer-based models

Education

Bachelor's degree in Computer Science or related field and 7+ years of experience
Master's degree in Computer Science or related field and 4+ years of experience
PhD in Computer Science or related field with 2+ years of experience

Tools

PyTorch
Transformers
vLLM
SGLang
Job description
A leading AI technology company is seeking a Senior Research Engineer to enhance inference models for its innovative hardware. The ideal candidate will possess advanced skills in Python or C++, along with significant experience in machine learning and AI technologies. This role involves designing and optimizing transformer architectures, leading research on inference algorithms, and collaborating across teams. Applicants must have a strong educational background in computer science, with several years of hands-on experience in the field. This hybrid role can be based in Toronto, ON, or Sunnyvale, CA.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.