Enable job alerts via email!

AI Inference Engineer

Perplexity

London

On-site

USD 190,000 - 240,000

Full time

2 days ago
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An innovative company is looking for an AI Inference Engineer to enhance their team. This role focuses on the large-scale deployment of machine learning models for real-time inference. You will develop APIs for AI inference, optimize the inference stack, and ensure system reliability. The ideal candidate will have a strong background in Python and C++, along with experience in ML systems and deep learning frameworks. This position offers a competitive salary and comprehensive benefits, making it an exciting opportunity for those passionate about AI and machine learning.

Benefits

Equity
Comprehensive health insurance
Dental insurance
Vision insurance
401(k) plan

Qualifications

  • Experience with ML systems and deep learning frameworks like PyTorch and TensorFlow.
  • Knowledge of LLM architectures and inference techniques such as batching and quantization.

Responsibilities

  • Develop APIs for AI inference used by internal and external customers.
  • Benchmark and optimize the inference stack to address bottlenecks.
  • Enhance system reliability and observability, and respond to outages.

Skills

Python
C++
TensorRT-LLM
Kubernetes
ML systems
deep learning frameworks
LLM architectures
inference techniques
GPU architectures
CUDA kernel programming

Job description

AI Inference Engineer

We are seeking an AI Inference Engineer to join our expanding team. Our current technology stack includes Python, C++, TensorRT-LLM, and Kubernetes. This role offers the opportunity to work on large-scale deployment of machine learning models for real-time inference.

Responsibilities
  • Develop APIs for AI inference used by internal and external customers
  • Benchmark and optimize the inference stack to address bottlenecks
  • Enhance system reliability and observability, and respond to outages
  • Research and implement optimizations for LLM inference
Qualifications
  • Experience with ML systems and deep learning frameworks (e.g., PyTorch, TensorFlow, ONNX)
  • Knowledge of LLM architectures and inference techniques (e.g., batching, quantization)
  • Experience deploying reliable, distributed, real-time model serving at scale
  • (Optional) Understanding of GPU architectures or CUDA kernel programming

The compensation range for this role is $190,000 - $240,000. Additional benefits include equity, comprehensive health insurance, dental, vision, and a 401(k) plan.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

AI Inference Engineer

Perplexity AI

London

On-site

USD 190,000 - 240,000

12 days ago