Enable job alerts via email!

Research Scientist (Test Time Compute)

Naptha AI

Vancouver

Remote

CAD 90,000 - 130,000

Full time

Yesterday
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading company in AI research is seeking an AI Research Scientist to enhance test time compute optimization for large language models. The ideal candidate will have a strong background in machine learning, experience with model compression, and excellent programming skills. This role involves research, development, and collaboration to solve technical challenges in model deployment, offering a competitive salary and benefits in a remote-first environment.

Benefits

Competitive salary with equity
Remote-first environment
Medical, dental, vision coverage
Flexible PTO
Learning and development budget
Conference and publication support
Home office setup allowance

Qualifications

  • Strong background in machine learning and systems optimization.
  • Deep understanding of model compression and inference techniques.
  • Experience with ML frameworks and deployment tools.

Responsibilities

  • Design and implement novel architectures for efficient model inference.
  • Research approaches to optimize test-time computation across hardware.
  • Collaborate with engineering team on implementation.

Skills

Machine Learning
Systems Optimization
Model Compression
Inference Techniques
Analytical Skills
Problem Solving
Programming (Python)

Education

PhD in relevant fields or equivalent experience

Tools

PyTorch
TensorFlow

Job description

AI Research Scientist (Test Time Compute) | naptha.ai

About The Role

We are seeking an exceptional AI Research Scientist to join Naptha AI at the ground floor, focusing on advancing the state of the art in test time compute optimization for large language models. In this role, you will research and develop novel approaches to improve inference efficiency, reduce computational requirements, and enhance model performance at deployment. You will work directly with our technical team to shape the architecture of our inference optimization platform.

This position addresses core technical challenges related to model compression, efficient inference strategies, and deployment optimization. You will operate at the intersection of machine learning, systems optimization, and hardware acceleration to develop practical solutions for real-world model deployment and scaling.

Core Responsibilities
  1. Research & Development
    • Design and implement novel architectures for efficient model inference
    • Develop frameworks for model compression and quantization
    • Research approaches to optimize test-time computation across hardware
    • Create protocols for distributed inference and resource management
    • Implement and test new ideas through rapid prototyping
  2. Technical Innovation
    • Stay updated on developments in ML efficiency and inference optimization
    • Identify and solve technical challenges in model deployment
    • Develop novel approaches to model compression and acceleration
    • Bridge theoretical research with practical implementation
    • Contribute to academic community via publications and open source
  3. Platform Development
    • Design and implement efficient inference pipelines
    • Develop scalable solutions for model deployment and serving
    • Create tools for performance monitoring and optimization
    • Collaborate with engineering team on implementation
    • Build proofs of concept for new optimization techniques
  4. Leadership & Collaboration
    • Work with engineering team to implement research findings
    • Mentor team members on optimization techniques
    • Contribute to technical strategy and roadmap
    • Collaborate with external research partners
    • Evaluate and integrate external research developments
Candidate Profile

Ideal candidates will have:

  • Strong background in machine learning and systems optimization
  • Deep understanding of model compression and inference techniques
  • Experience with ML frameworks and deployment tools
  • Experience with ML infrastructure and hardware acceleration
  • Proven track record of implementing efficient ML systems
  • Excellent programming skills (Python required, C++/CUDA a plus)
  • Strong analytical and problem-solving skills
  • PhD in relevant fields or equivalent experience is a plus
  • Published research is a plus
Technical Experience Needed
  • Python programming and ML frameworks (PyTorch, TensorFlow)
  • Model optimization techniques (quantization, pruning, distillation)
  • MLOps and deployment
  • Hardware acceleration (GPU, TPU)
  • Version control and collaborative development
  • Experience with large language models
Hiring Process
  • Initial technical interview
  • Research presentation
  • System design discussion
  • Technical challenge
  • Team collaboration interview
Compensation & Benefits
  • Competitive salary with equity
  • Remote-first environment
  • Medical, dental, vision coverage
  • Flexible PTO
  • Learning and development budget
  • Conference and publication support
  • Home office setup allowance
Additional Notes
  • Comfort with ambiguity and rapid iteration
  • Practical approach to research ideas
  • Passion for advancing efficient ML systems
  • Interest in open source community

Naptha AI is committed to diversity and inclusion. We are an equal opportunity employer and welcome applications from all qualified candidates.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.