Enable job alerts via email!

Staff Platform Engineer, MLOps - USA

Inworld

Mountain View (CA)

Hybrid

USD 180,000 - 280,000

Full time

12 days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An innovative firm is seeking a Staff Platform Engineer (MLOps) to enhance their AI infrastructure. In this role, you will collaborate with backend and ML teams to design and maintain high-performance cloud systems. Your expertise will drive the ML model lifecycle, implement CI/CD systems, and ensure reliability across services. This position offers a unique opportunity to work with cutting-edge technology and contribute to the development of real-time interactive experiences. Join a forward-thinking company that values engineering excellence and fosters a culture of continuous improvement.

Qualifications

  • 7+ years in software engineering with a focus on MLOps.
  • Expertise in Kubernetes and CI/CD for ML workflows.

Responsibilities

  • Develop and optimize ML model lifecycle using Inworld AI platform.
  • Manage CI/CD pipelines for seamless code integration.

Skills

Software Engineering
Infrastructure-as-Code
Kubernetes Management
CI/CD Pipelines
Cloud Provider Knowledge
Backend Programming (Golang, Python, Bash)
Open Source LLM Familiarity
Data Pipeline Management

Tools

Terraform
GitHub Actions
Ansible
SLURM

Job description

view open roles

Why Join Inworld

Inworld is the leading provider of AI technology for real-time interactive experiences, with a $500 million valuation and backing from top tier investors including Intel Capital, Microsoft’s M12 fund, Lightspeed Venture Partners, Section 32, BITKRAFT Ventures, Kleiner Perkins, Founders Fund, and First Spark Ventures.

Inworld provides the market’s best framework for building production ready interactive experiences, coupled with dedicated services to optimize specific stages of development – from design and development, to ML pipeline optimization and custom compute infrastructure. We help developers bring their AI engines in-house with a framework optimized for real-time data ingestion, low latency, and massive scale. Inworld powers experiences built by Ubisoft, NVIDIA, Niantic, NetEase Games and LG, among others, and has partnerships with key industry players such as Microsoft Xbox, Epic Games, and Unity.

Inworld was recognized by CB Insights as one of the 100 most promising AI companies in the world in 2024 and was named among LinkedIn's Top Startups of 2024 in the USA.

About the role:

As a Staff Platform Engineer (MLOps), you'll work closely with backend and ML Engineering teams to design, deploy, and maintain reliable, high-performance, and secure cloud infrastructure for our AI Engine and Studio.

What you'll do:
  • Develop, manage, and optimize the ML model lifecycle in production using the Inworld AI platform and Nvidia CUDA, implementing CI/CD systems for ML workflows, monitoring models to identify issues and inefficiencies, and designing MLOps tools and frameworks to enhance automation and efficiency.
  • Facilitate a "you build it, you run it" culture by providing the necessary tools and processes for monitoring the reliability, availability, and performance of services.
  • Manage CI/CD pipelines to ensure smooth and efficient code integration and deployment.
  • Identify and implement opportunities to enhance engineering speed and efficiency.
  • Conduct root cause analysis to identify critical issues and develop automated solutions to prevent recurrence.
  • Develop and share best practices to improve automation and efficiency across our engineering teams.
Expected experience:
  • 7 years of experience in software engineering.
  • 5 years of experience with infrastructure-as-code.
  • Proficiency in managing Kubernetes clusters and applications, including creating Helm charts/Kustomize manifests for new applications.
  • Experience in creating and maintaining CI/CD pipelines for both applications and infrastructure deployments (using tools like Terraform/Terragrunt, ArgoCD, GitHub Actions, Ansible, etc.).
  • Deep knowledge of at least one major cloud provider (Google Cloud Platform, Microsoft Azure, Oracle Cloud).
  • Proficient in at least one backend programming/scripting languages such as Golang, Python, and Bash.
  • Familiarity with open source LLM and open source serving solution (e.g. vLLM or llama.cpp, kserve, etc) is a plus.
  • Experience with SLURM
  • Experience with data pipeline and workflow management tools
  • Experience with bare metal GPUs (optional)

In-office location: Mountain View, CA, United States. You must be available for hybrid work.

The US base salary range for this full-time position is $180,000 - $280,000. In addition to base pay, total compensation includes equity and benefits. Within the range, individual pay is determined by work location, level, and additional factors, including competencies, experience, and business needs. The base pay range is subject to change and may be modified in the future.

Inworld Jobs Privacy

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.