Enable job alerts via email!

Senior ML infrastructure engineer

ZipRecruiter

San Francisco (CA)

On-site

USD 180,000 - 250,000

Full time

24 days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

Kuzco is seeking a Senior ML Infrastructure Engineer to develop large-scale, fault-tolerant systems for their distributed LLM inference network. Ideal candidates will have strong programming skills, a deep understanding of distributed systems, and a passion for building next-generation ML systems.

Benefits

Equity in a high-growth startup
Comprehensive benefits

Qualifications

  • 5+ years of experience in building high-performance systems.
  • Solid understanding of distributed systems concepts.
  • Experience with LLM inference engines is a plus.

Responsibilities

  • Design and implement scalable distributed systems for the inference network.
  • Optimize network latency, throughput, and availability.
  • Build robust logging and metrics systems to monitor network health.

Skills

Problem-solving
Distributed systems
Programming in Typescript
Programming in Python
Programming in Go
Programming in Rust
Programming in C++
AI tooling
GPU programming

Tools

Kubernetes
Nomad
vLLM
TensorRT-LLM
CUDA

Job description

Job Description

Kuzco is seeking a Senior ML Infrastructure Engineer to join our team. This role involves developing large-scale, fault-tolerant systems that handle millions of large model inference requests per day. If you are passionate about developing next-generation ML systems that operate at scale, we want to hear from you.

About Kuzco

We are building a distributed LLM inference network that combines idle GPU capacity from around the world into a single cohesive plane of compute that can be used for running large models like Llama and Mistral. At any given moment, we have over 5,000 GPUs and hundreds of terabytes of VRAM connected to the network. Learn more here.

We are a small, well-funded team of staff-level engineers who work in-person in downtown San Francisco on difficult, high-impact engineering problems. Everyone on the team has been writing code for over 10 years, and has founded and run their own software companies. We are high-agency, adaptable, and collaborative. We value creativity alongside technical prowess and humility. We work hard, and deeply enjoy the work that we do; we are almost always online at least six days per week.

About the Role

You will be responsible for designing and implementing the core systems that power our globally distributed LLM inference network. You'll work on problems at the intersection of distributed systems, machine learning, and resource optimization.

Key Responsibilities

  1. Design and implement scalable distributed systems for our inference network
  2. Develop models for efficient resource allocation across a network of heterogeneous hardware and quickly changing topology
  3. Optimize network latency, throughput, and availability
  4. Build robust logging and metrics systems to monitor network health and performance
  5. Conduct reviews of architecture and system design to ensure use of best practices
  6. Collaborate with founders, engineers, and other stakeholders to improve our infrastructure and product offerings

What We're Looking For

  • Very strong problem-solving skills and ability to work in a startup environment
  • 5+ years of experience in building high-performance systems
  • Strong programming skills in Typescript, Python, and one of Go, Rust, or C++
  • Solid understanding of distributed systems concepts
  • Knowledge of orchestrators and schedulers like Kubernetes and Nomad
  • Experience with AI tooling in development workflow (ChatGPT, Claude, Cursor, etc.)
  • Experience with LLM inference engines like vLLM or TensorRT-LLM is a plus
  • Experience with GPU programming and optimization (CUDA experience is a plus)

Compensation

We offer competitive compensation, equity in a high-growth startup, and comprehensive benefits. The base salary range for this role is $180,000 - $250,000, plus equity and benefits, depending on experience.

Equal Opportunity

Kuzco is an equal opportunity employer. We welcome applicants from all backgrounds and don't discriminate based on race, gender, age, disability, ethnicity, or veteran status.

If you're excited about building the future of developer-first AI infrastructure, we'd love to hear from you. Please send your resume, LinkedIn, and GitHub to sam@kuzco.xyz.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Senior ML Storage Infrastructure Engineer

Zoox

Foster City

On-site

USD 176,000 - 288,000

2 days ago
Be an early applicant

Senior ML Infrastructure Engineer

Apple Inc.

Cupertino

On-site

USD 175,000 - 265,000

2 days ago
Be an early applicant

Senior Software Engineer, ML Infrastructure

LM Arena

San Francisco

Hybrid

USD 200,000 - 300,000

6 days ago
Be an early applicant

Senior ML Infrastructure Engineer

Unity

San Francisco

On-site

USD 111,000 - 212,000

30+ days ago

Senior ML infrastructure engineer

SOLANA FOUNDATION

San Francisco

On-site

USD 200,000 - 250,000

30+ days ago

Senior Machine Learning Infrastructure Engineer

PlusAI Inc

Santa Clara

On-site

USD 160,000 - 200,000

26 days ago

Senior ML Infrastructure Engineer

Apple

Cupertino

On-site

USD 175,000 - 313,000

30+ days ago

Principal/Lead/Senior Software Engineer - ML Infrastructure

salesforce.com, inc.

California

On-site

USD 137,000 - 335,000

30+ days ago

Senior Software Engineer, ML Infrastructure, Predictive Planner

Waymo

Mountain View

On-site

USD 204,000 - 259,000

16 days ago