Enable job alerts via email!

AI Inference Engineer

Perplexity

London

On-site

USD 190,000 - 240,000

Full time

2 days ago

Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An innovative company is looking for an AI Inference Engineer to enhance their team. This role focuses on the large-scale deployment of machine learning models for real-time inference. You will develop APIs for AI inference, optimize the inference stack, and ensure system reliability. The ideal candidate will have a strong background in Python and C++, along with experience in ML systems and deep learning frameworks. This position offers a competitive salary and comprehensive benefits, making it an exciting opportunity for those passionate about AI and machine learning.

Benefits

Equity

Comprehensive health insurance

Dental insurance

Vision insurance

401(k) plan

Qualifications

Experience with ML systems and deep learning frameworks like PyTorch and TensorFlow.
Knowledge of LLM architectures and inference techniques such as batching and quantization.

Responsibilities

Develop APIs for AI inference used by internal and external customers.
Benchmark and optimize the inference stack to address bottlenecks.
Enhance system reliability and observability, and respond to outages.

Skills

Python

C++

TensorRT-LLM

Kubernetes

ML systems

deep learning frameworks

LLM architectures

inference techniques

GPU architectures

CUDA kernel programming

AI Inference Engineer

We are seeking an AI Inference Engineer to join our expanding team. Our current technology stack includes Python, C++, TensorRT-LLM, and Kubernetes. This role offers the opportunity to work on large-scale deployment of machine learning models for real-time inference.

Responsibilities

Develop APIs for AI inference used by internal and external customers
Benchmark and optimize the inference stack to address bottlenecks
Enhance system reliability and observability, and respond to outages
Research and implement optimizations for LLM inference

Qualifications

Experience with ML systems and deep learning frameworks (e.g., PyTorch, TensorFlow, ONNX)
Knowledge of LLM architectures and inference techniques (e.g., batching, quantization)
Experience deploying reliable, distributed, real-time model serving at scale
(Optional) Understanding of GPU architectures or CUDA kernel programming

The compensation range for this role is $190,000 - $240,000. Additional benefits include equity, comprehensive health insurance, dental, vision, and a 401(k) plan.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs