Enable job alerts via email!

Research Scientist (Test Time Compute)

Naptha AI

Vancouver

Remote

CAD 90,000 - 130,000

Full time

Yesterday

Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading company in AI research is seeking an AI Research Scientist to enhance test time compute optimization for large language models. The ideal candidate will have a strong background in machine learning, experience with model compression, and excellent programming skills. This role involves research, development, and collaboration to solve technical challenges in model deployment, offering a competitive salary and benefits in a remote-first environment.

Benefits

Competitive salary with equity

Remote-first environment

Medical, dental, vision coverage

Flexible PTO

Learning and development budget

Conference and publication support

Home office setup allowance

Qualifications

Strong background in machine learning and systems optimization.
Deep understanding of model compression and inference techniques.
Experience with ML frameworks and deployment tools.

Responsibilities

Design and implement novel architectures for efficient model inference.
Research approaches to optimize test-time computation across hardware.
Collaborate with engineering team on implementation.

Skills

Machine Learning

Systems Optimization

Model Compression

Inference Techniques

Analytical Skills

Problem Solving

Programming (Python)

Education

PhD in relevant fields or equivalent experience

Tools

PyTorch

TensorFlow

AI Research Scientist (Test Time Compute) | naptha.ai

About The Role

We are seeking an exceptional AI Research Scientist to join Naptha AI at the ground floor, focusing on advancing the state of the art in test time compute optimization for large language models. In this role, you will research and develop novel approaches to improve inference efficiency, reduce computational requirements, and enhance model performance at deployment. You will work directly with our technical team to shape the architecture of our inference optimization platform.

This position addresses core technical challenges related to model compression, efficient inference strategies, and deployment optimization. You will operate at the intersection of machine learning, systems optimization, and hardware acceleration to develop practical solutions for real-world model deployment and scaling.

Core Responsibilities

Research & Development
- Design and implement novel architectures for efficient model inference
- Develop frameworks for model compression and quantization
- Research approaches to optimize test-time computation across hardware
- Create protocols for distributed inference and resource management
- Implement and test new ideas through rapid prototyping
Technical Innovation
- Stay updated on developments in ML efficiency and inference optimization
- Identify and solve technical challenges in model deployment
- Develop novel approaches to model compression and acceleration
- Bridge theoretical research with practical implementation
- Contribute to academic community via publications and open source
Platform Development
- Design and implement efficient inference pipelines
- Develop scalable solutions for model deployment and serving
- Create tools for performance monitoring and optimization
- Collaborate with engineering team on implementation
- Build proofs of concept for new optimization techniques
Leadership & Collaboration
- Work with engineering team to implement research findings
- Mentor team members on optimization techniques
- Contribute to technical strategy and roadmap
- Collaborate with external research partners
- Evaluate and integrate external research developments

Candidate Profile

Ideal candidates will have:

Strong background in machine learning and systems optimization
Deep understanding of model compression and inference techniques
Experience with ML frameworks and deployment tools
Experience with ML infrastructure and hardware acceleration
Proven track record of implementing efficient ML systems
Excellent programming skills (Python required, C++/CUDA a plus)
Strong analytical and problem-solving skills
PhD in relevant fields or equivalent experience is a plus
Published research is a plus

Technical Experience Needed

Python programming and ML frameworks (PyTorch, TensorFlow)
Model optimization techniques (quantization, pruning, distillation)
MLOps and deployment
Hardware acceleration (GPU, TPU)
Version control and collaborative development
Experience with large language models

Hiring Process

Initial technical interview
Research presentation
System design discussion
Technical challenge
Team collaboration interview

Compensation & Benefits

Competitive salary with equity
Remote-first environment
Medical, dental, vision coverage
Flexible PTO
Learning and development budget
Conference and publication support
Home office setup allowance

Additional Notes

Comfort with ambiguity and rapid iteration
Practical approach to research ideas
Passion for advancing efficient ML systems
Interest in open source community

Naptha AI is committed to diversity and inclusion. We are an equal opportunity employer and welcome applications from all qualified candidates.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.