Enable job alerts via email!

AI Inference Engineer

Perplexity AI

City Of London

On-site

GBP 60,000 - 80,000

Full time

30+ days ago

Job summary

An innovative AI company based in London is seeking a Machine Learning Engineer to develop and optimize AI inference APIs for real-time applications. The ideal candidate has experience with ML systems and deep learning frameworks such as PyTorch and TensorFlow. Responsibilities include improving system reliability and exploring innovative techniques for LLM optimization. Competitive compensation is offered.

Qualifications

  • Experience with deep learning frameworks (e.g., PyTorch, TensorFlow)
  • Familiarity with LLM architectures and optimization techniques.
  • Experience deploying distributed real-time model serving.

Responsibilities

  • Develop APIs for AI inference for internal and external use.
  • Benchmark and address bottlenecks in the inference stack.
  • Improve system reliability and observability.

Skills

Experience with ML systems and deep learning frameworks
Familiarity with common LLM architectures
Experience with deploying reliable model serving
Understanding of GPU architectures

Tools

PyTorch
TensorFlow
CUDA
Job description
Overview

Perplexity is an AI-powered answer engine founded in December 2022 and growing rapidly as one of the world’s leading AI platforms. Our objective is to build accurate, trustworthy AI that powers decision-making for people and assistive AI wherever decisions are being made. Our current stack includes Python, Rust, C++, PyTorch, Triton, CUDA, and Kubernetes. You will have the opportunity to work on large-scale deployment of machine learning models for real-time inference.

Responsibilities
  • Develop APIs for AI inference that will be used by both internal and external customers
  • Benchmark and address bottlenecks throughout our inference stack
  • Improve the reliability and observability of our systems and respond to system outages
  • Explore novel research and implement LLM inference optimizations
Qualifications
  • Experience with ML systems and deep learning frameworks (e.g. PyTorch, TensorFlow, ONNX)
  • Familiarity with common LLM architectures and inference optimization techniques (e.g. continuous batching, quantization, etc.)
  • Experience with deploying reliable, distributed, real-time model serving at scale
  • (Optional) Understanding of GPU architectures or experience with GPU kernel programming using CUDA
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.