
Ativa os alertas de emprego por e-mail!
Cria um currículo personalizado em poucos minutos
Consegue uma entrevista e ganha mais. Sabe mais
A leading technology company is seeking an Inference Systems Engineer to enhance the serving runtime for production LLM inference. This deeply technical role involves optimizing system performance, collaborating with platform teams, and driving improvements in performance stability and efficiency. Candidates should have over 5 years of experience in building high-performance systems, particularly in model serving or low-latency environments. Excellent communication and engineering hygiene are essential for success in this role.
Inference Systems Engineer
Remote
Infrastructure / Serving Systems
$5,651 - $6,469/month USD
As an Inference Systems Engineer, you will own the serving runtime that powers production LLM inference. This is a deeply technical role focused on system performance and stability: optimizing request lifecycle behavior, streaming correctness, batching/scheduling strategy, cache and memory behavior, and runtime execution efficiency. You will ship changes that improve TTFT, p95/p99 latency, throughput, and cost efficiency while preserving correctness and reliability under multi-tenant load.
You will collaborate closely with platform/infrastructure operations, networking, and API/control-plane teams to ensure the serving system behaves predictably in production and can be debugged quickly when incidents occur. This role is for engineers who can reason about the entire inference pipeline, validate improvements with rigorous measurement, and operate with production‑grade discipline.