Enable job alerts via email!

Platform engineer, MLOps - San Francisco, CA (hybrid)

Jobgether

San Francisco (CA)

Hybrid

USD 150,000 - 300,000

Full time

3 days ago
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

Join a forward-thinking company as a Platform Engineer specializing in MLOps. This exciting role focuses on building and maintaining the infrastructure that powers AI and ML development. Collaborate with talented machine learning engineers to implement robust CI/CD pipelines and optimize large-scale training workloads. Your contributions will enhance operational excellence and system reliability in a high-growth environment. Enjoy generous benefits, including comprehensive medical coverage and flexible spending accounts, while working in a dynamic team that values innovation and collaboration.

Benefits

Generous paid time off
Comprehensive medical, dental, and vision insurance
12 weeks of paid parental leave
Fertility and family planning support
Early cancer detection screenings
Flexible spending accounts
Annual stipends for home office setup
Company-wide and team off-site events
Competitive salary and stock options
401(k) plan

Qualifications

  • 5+ years experience in building and managing core infrastructure.
  • Expertise in Kubernetes and Docker for ML workloads.

Responsibilities

  • Develop and manage CI/CD pipelines for ML experiments.
  • Operate and optimize large Kubernetes clusters for GPU workloads.

Skills

Kubernetes
Docker
Python
Bash
Git/GitHub
ML frameworks (PyTorch, Huggingface)
Cloud platforms (GCP, AWS, Azure)
Terraform
Prometheus
Grafana

Education

Bachelor's Degree in Computer Science or related field

Tools

Kubernetes
Docker
Terraform
Prometheus
Grafana

Job description

Platform engineer, MLOps - San Francisco, CA (hybrid)
Platform engineer, MLOps - San Francisco, CA (hybrid)

1 week ago Be among the first 25 applicants

About Jobgether

Jobgether is a Talent Matching Platform that partners with companies worldwide to efficiently connect top talent with the right opportunities through AI-driven job matching.

About Jobgether

Jobgether is a Talent Matching Platform that partners with companies worldwide to efficiently connect top talent with the right opportunities through AI-driven job matching.

One of our companies is currently looking for a Platform Engineer, MLOps in San Francisco, CA.

This role is focused on building and maintaining the infrastructure that powers AI/ML development and production environments. You'll collaborate closely with machine learning engineers and researchers to implement robust CI/CD pipelines, oversee container orchestration systems, and optimize large-scale training and inference workloads. Your work will ensure operational excellence, improve system reliability, and scale Kubernetes clusters supporting GPU-intensive tasks. This is a high-impact position for someone passionate about automating ML workflows and solving infrastructure challenges in fast-paced, high-growth environments.

Accountabilities:

  • Develop and manage CI/CD pipelines that support safe, reproducible machine learning experiments
  • Set up and monitor logging, alerting, and observability systems for model training and production APIs
  • Operate and optimize large Kubernetes clusters for GPU workloads
  • Manage containerization using Docker and orchestrate deployments via Kubernetes
  • Ensure high availability of training environments across distributed systems
  • Support and enhance the performance, scalability, and security of MLOps infrastructure
  • Troubleshoot complex systems and contribute to reliability improvements across ML platforms


Requirements

  • 5+ years of experience building and managing core infrastructure for large-scale systems
  • Deep hands-on experience with Kubernetes, Docker, and GPU workload orchestration
  • Expertise in cloud platforms (GCP, AWS, or Azure) and infrastructure-as-code tools (Terraform)
  • Proficiency with scripting (Python, Bash) and Git/GitHub workflows
  • Familiarity with ML frameworks like PyTorch, Huggingface Transformers, TensorRT, and vLLM
  • Experience with monitoring tools such as Prometheus, Grafana, or equivalent
  • Strong problem-solving skills and the ability to operate in a dynamic, ambiguous environment
  • Experience running inference clusters and managing CI/CD pipelines in ML-focused environments


Benefits

  • Generous paid time off and company holidays
  • Comprehensive medical, dental, and vision insurance for employees and dependents
  • 12 weeks of paid parental leave
  • Fertility and family planning support
  • Early cancer detection screenings through Galleri
  • Flexible spending accounts (FSA), dependent care FSA, and HSA with company contributions
  • Annual stipends for home office setup, phone/internet, wellness, and learning & development
  • Company-wide and team off-site events
  • Competitive salary, stock options, and 401(k) plan


Jobgether hiring process disclaimer

This job is posted on behalf of one of our partner companies. If you choose to apply, your application will go through our AI-powered 3-step screening process, where we automatically select the 5 best candidates.

Our AI thoroughly analyzes every line of your CV and LinkedIn profile to assess your fit for the role, evaluating each experience in detail. When needed, our team may also conduct a manual review to ensure only the most relevant candidates are considered.

Our process is fair, unbiased, and based solely on qualifications and relevance to the job. Only the best-matching candidates will be selected for the next round.

If you are among the top 5 candidates, you will be notified within 7 days.

If you do not receive feedback after 7 days, it means you were not selected. However, if you wish, we may consider your profile for other similar opportunities that better match your experience.

Thank you for your interest!

Seniority level
  • Seniority level
    Mid-Senior level
Employment type
  • Employment type
    Full-time
Job function
  • Job function
    Information Technology
  • Industries
    Non-profit Organizations and Primary and Secondary Education

Referrals increase your chances of interviewing at Jobgether by 2x

Get notified about new Platform Engineer jobs in San Francisco, CA.

San Francisco, CA $150,000.00-$300,000.00 8 months ago

San Francisco, CA $150,000.00-$300,000.00 8 months ago

Software Engineer - Supercomputing Platform & Infrastructure

San Francisco, CA $100,000.00-$550,000.00 6 months ago

San Francisco, CA $150,000.00-$300,000.00 8 months ago

Software Engineer, Cloud Console Platform, Front End
Sr. Software Engineer, ML Platform - Slack

San Francisco, CA $150,000.00-$200,000.00 4 months ago

San Francisco, CA $140,000.00-$170,000.00 1 month ago

Site Reliability Engineer, AI/ML Platforms
Platform Engineer — Infra / Reliability Specialist

San Francisco, CA $150,000.00-$300,000.00 8 months ago

San Francisco, CA $180,000.00-$240,000.00 6 months ago

San Francisco, CA $150,000.00-$230,000.00 1 month ago

San Francisco, CA $149,998.00-$250,000.00 7 months ago

San Francisco, CA $88,000.00-$140,000.00 3 weeks ago

Staff Frontend Engineer, Client Platform

San Francisco, CA $250,000.00-$300,000.00 4 days ago

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Platform engineer, MLOps - San Francisco, CA (hybrid)

ZipRecruiter

San Francisco

On-site

USD 120,000 - 160,000

Yesterday
Be an early applicant