Enable job alerts via email!

Member of Technical Staff, MLOps / ML Infrastructure

Side Hamburg

Palo Alto (CA)

On-site

USD 175,000 - 350,000

Full time

30+ days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

Join a forward-thinking organization as a Member of Technical Staff, where you will design and build robust ML infrastructure. This role is pivotal in ensuring seamless operations of machine learning workflows, from training to deployment. You will collaborate with a talented team to create scalable solutions that meet enterprise needs. If you have a passion for innovative technology and thrive in dynamic environments, this opportunity is perfect for you. Be part of a mission-driven company that values collaboration and creativity in building cutting-edge AI solutions.

Qualifications

  • Extensive experience in operating ML systems in production environments.
  • Strong open-source background with hands-on expertise in managing distributed clusters.

Responsibilities

  • Design and implement scalable ML infrastructure for machine learning workflows.
  • Build and manage control planes and tools for ML services operation.

Skills

Operating ML systems
Managing distributed clusters
Security best practices
Collaboration with ML researchers

Tools

Kubernetes
SLURM
Ray

Job description

Member of Technical Staff, MLOps / ML Infrastructure
Inflection AI is a public benefit corporation leveraging our world class large language model to build the first AI platform focused on the needs of the enterprise.
Who we are:

Inflection AI was re-founded in March of 2024 and our leadership team has assembled a team of kind, innovative, and collaborative individuals focused on building enterprise AI solutions. We are an organization passionate about what we are building, enjoy working together and strive to hire people with diverse backgrounds and experience.

Our first product, Pi, provides an empathetic and conversational chatbot. Pi is a public instance of building from our 350B+ frontier model with our sophisticated fine-tuning (10M+ examples), inference, and orchestration platform. We are now focusing on building new systems that directly support the needs of enterprise customers using this same approach.

About the Role

As a Member of Technical Staff on our MLOps / ML Infrastructure team, you will be at the core of designing, building, and operating the systems that power our machine learning workflows—from model training to production deployment. Your work will be critical in developing control planes and robust tools around ML services, ensuring our platform is scalable, secure, and resilient. We’re looking for candidates with production operations experience, strong open-source backgrounds, and hands-on expertise in managing distributed clusters.

This is a good role for you if you:

  • Have extensive experience operating ML systems in production environments and building tools to manage them.
  • Are highly proficient in managing distributed clusters using Kubernetes (K8s), SLURM, and Ray.
  • Possess a strong open-source background, with experience at top-tier companies, and are comfortable leveraging community-driven tools as well as proprietary solutions.
  • Are security-aware and knowledgeable about best practices for safeguarding production systems, even though the role is not exclusively security-focused.
  • Thrive in dynamic, innovative environments where pushing the boundaries of ML infrastructure is a daily pursuit.

Responsibilities include:

  • Designing and implementing scalable ML infrastructure to support end-to-end machine learning workflows—from training and deployment to production operations.
  • Building and managing control planes and tools that ensure efficient, secure, and reliable operation of our ML services.
  • Collaborating closely with ML researchers, data scientists, and engineers to optimize system performance and resource utilization.
  • Leveraging distributed computing frameworks (Kubernetes, SLURM, Ray) to orchestrate ML workloads across diverse environments.
  • Continuously evaluating and integrating emerging technologies to enhance scalability, efficiency, and security across our ML systems.
  • Maintaining a strong security posture through the adoption of best practices in infrastructure design and operations.
Employee Pay Disclosures

At Inflection AI, we aim to attract and retain the best employees and compensate them in a way that appropriately and fairly values their individual contributions to the company. For this role, Inflection AI estimates a starting annual base salary will fall in the range of approximately $175,000 - $350,000 depending on experience.

Interview Process

Apply: Please apply on LinkedIn or our website for a specific role.

After speaking with one of our recruiters, you’ll enter our structured interview process, which includes the following stages:

  • Hiring Manager Conversation– An initial discussion with the hiring manager to assess fit and alignment.
  • Technical Interview– A deep dive with an Inflection Engineer to evaluate your technical expertise.
  • Domain-specific Interview
  • Final Conversation with the Hiring Manager

Depending on the role, we may also ask you to complete a take-home exercise or deliver a presentation.

Decision Timeline– We aim to provide feedback within one week of your final interview.

Apply for this job

First Name *

Last Name *

Email *

Phone

Resume/CV *

LinkedIn Profile *

Describe your most significant professional achievements and the impact they had on your team or organization.

Tell us about a particularly challenging problem you've faced in your career. How did you approach it, and what was the outcome?

What area of Artificial Intelligence are you most passionate about, and why?

Do you now or in the future require work authorization to work legally in the United States? *

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.