Enable job alerts via email!

Staff Platform Engineer, MLOps

Inworld AI

Mountain View (CA)

On-site

USD 180,000 - 280,000

Full time

7 days ago
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

Join a dynamic team at a leading AI technology provider and play a pivotal role as a Platform Engineer. You'll collaborate with talented backend and ML engineers to build, secure, and scale cutting-edge AI infrastructure. This role involves optimizing the ML model lifecycle, enhancing CI/CD systems, and ensuring high-performance cloud services. With a strong focus on innovation and collaboration, you'll help shape the future of interactive experiences in gaming. If you're passionate about AI and infrastructure, this is an exciting opportunity to make a significant impact in a rapidly evolving field.

Benefits

Medical insurance
Vision insurance
401(k)
Equity options
Flexible work hours

Qualifications

  • 7+ years of experience in software engineering.
  • 5 years of experience with infrastructure-as-code.
  • Proficiency in managing Kubernetes clusters.

Responsibilities

  • Develop and optimize the ML model lifecycle using the Inworld AI platform.
  • Manage CI/CD pipelines for applications and infrastructure deployments.
  • Conduct root cause analysis to identify and automate solutions.

Skills

Software Engineering
Infrastructure as Code
Kubernetes
CI/CD Pipelines
Backend Programming (Golang, Python, Bash)
Cloud Platforms (GCP, Azure, Oracle)
Monitoring and Automation

Education

Bachelor's Degree in Computer Science or related field

Tools

Terraform
ArgoCD
GitHub Actions
Ansible
Nvidia CUDA
SLURM

Job description

Direct message the job poster from Inworld AI

Inworld is the leading provider of AI technology for real-time interactive experiences, with a $500 million valuation and backing from top tier investors including Intel Capital, Microsoft’s M12 fund, Lightspeed Venture Partners, Section 32, BITKRAFT Ventures, Kleiner Perkins, Founders Fund, and First Spark Ventures.

Inworld provides the market’s best framework for building production ready interactive experiences, coupled with dedicated services to optimize specific stages of development – from design and development, to ML pipeline optimization and custom compute infrastructure. We help developers bring their AI engines in-house with a framework optimized for real-time data ingestion, low latency, and massive scale. Inworld powers experiences built by Ubisoft, NVIDIA, Niantic, NetEase Games and LG, among others, and has partnerships with key industry players such as Microsoft Xbox, Epic Games, and Unity.

Inworld was recognized by CB Insights as one of the 100 most promising AI companies in the world in 2024 and was named among LinkedIn’s Top Startups of 2024 in the USA.

Join our dynamic team as a Platform Engineer and play a pivotal role in building, securing, and scaling our cutting-edge AI engine for games. Collaborate with backend and ML engineers to plan, deploy, and maintain services across major cloud providers using Terraform, ArgoCD and other tooling. As part of our small, collaborative team, your contributions will significantly shape our operations and innovation.

About the role

As a Staff Platform Engineer (MLOps), you'll work closely with backend and ML Engineering teams to design, deploy, and maintain reliable, high-performance, and secure cloud infrastructure for our AI Engine and Studio.

What you’ll do

  • Develop, manage, and optimize the ML model lifecycle in production using the Inworld AI platform and Nvidia CUDA, implementing CI/CD systems for ML workflows, monitoring models to identify issues and inefficiencies, and designing MLOps tools and frameworks to enhance automation and efficiency.
  • Facilitate a "you build it, you run it" culture by providing the necessary tools and processes for monitoring the reliability, availability, and performance of services.
  • Manage CI/CD pipelines to ensure smooth and efficient code integration and deployment.
  • Identify and implement opportunities to enhance engineering speed and efficiency.
  • Conduct root cause analysis to identify critical issues and develop automated solutions to prevent recurrence.
  • Develop and share best practices to improve automation and efficiency across our engineering teams.

Requirements

  • 7+ years of experience in software engineering.
  • 5 years of experience with infrastructure-as-code.
  • Proficiency in managing Kubernetes clusters and applications, including creating Helm charts/Kustomize manifests for new applications.
  • Experience in creating and maintaining CI/CD pipelines for both applications and infrastructure deployments (using tools like Terraform/Terragrunt, ArgoCD, GitHub Actions, Ansible, etc.).
  • Deep knowledge of at least one major cloud provider (Google Cloud Platform, Microsoft Azure, Oracle Cloud).
  • Proficient in at least one backend programming/scripting languages such as Golang, Python, and Bash.
  • Familiarity with open source LLM and open source serving solution (e.g. vLLM or llama.cpp, kserve, etc) is a plus.
  • Experience with SLURM
  • Experience with data pipeline and workflow management tools
  • Experience with bare metal GPUs (optional)

Candidates must be based in the SF Bay Area or willing to relocate (you will be working on-site in our South Bay office a few days a week).

The base salary range for this full-time position is between $180,000 - $280,000 + bonus + equity + benefits. Your recruiter can share more about the specific salary range for your targeted location during the hiring process.

Seniority level
  • Seniority level
    Mid-Senior level
Employment type
  • Employment type
    Full-time
Job function
  • Industries
    Software Development

Referrals increase your chances of interviewing at Inworld AI by 2x

Inferred from the description for this job

Medical insurance

Vision insurance

401(k)

Get notified about new Platform Engineer jobs in Mountain View, CA.

Software Engineer, AI Platform - New Grad
Software Engineer L4, Machine Learning Platform (Metaflow)

Mountain View, CA $167,200.00-$250,800.00 2 weeks ago

Mountain View, CA $167,200.00-$250,800.00 1 week ago

San Jose, CA $113,400.00-$206,300.00 1 week ago

Sunnyvale, CA $136,000.00-$200,000.00 1 week ago

Member of Technical Staff, Platform Engineer
Member of Technical Staff Platform Engineer

Mountain View, CA $117,200.00-$294,000.00 1 week ago

Sunnyvale, CA $136,000.00-$200,000.00 1 week ago

Software Engineer L4/L5, Model Serving Systems, Machine Learning Platform

Sunnyvale, CA $136,000.00-$200,000.00 2 weeks ago

San Jose, CA $118,657.00-$177,000.00 2 weeks ago

Site Reliability Engineer, AI/ML Platforms

Sunnyvale, CA $136,000.00-$200,000.00 4 days ago

Software Engineer Intern- AI/ML & Kubernetes
Software Engineer III, Front End, Google Cloud Business Platforms

Sunnyvale, CA $136,000.00-$200,000.00 2 weeks ago

Sunnyvale, CA $136,000.00-$200,000.00 4 days ago

Sunnyvale, CA $136,000.00-$200,000.00 4 days ago

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Staff Platform Engineer, MLOps - USA

Inworld

Mountain View

Hybrid

USD 180 000 - 280 000

12 days ago

Staff AI Platform Engineer

SentinelOne

Remote

USD 148 000 - 235 000

16 days ago