Enable job alerts via email!

Software Engineer - Cloud Engineering, Kubernetes

Kumo

Mountain View (CA)

On-site

USD 98,000 - 126,000

Full time

11 days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

Join a forward-thinking company as a Software Engineer in Cloud Engineering, where you'll architect resilient Kubernetes infrastructure for a cutting-edge AI platform. This role offers the chance to work with advanced technologies in a collaborative environment, driving automation and enhancing scalability across multi-cloud platforms. You'll play a crucial role in shaping the future of cloud-native applications, ensuring high availability and performance. If you're passionate about Kubernetes and eager to make an impact, this opportunity is perfect for you!

Benefits

Competitive salary and equity options
Comprehensive medical and dental insurance
Inclusive and diverse work environment

Qualifications

  • 5+ years of experience managing large-scale Kubernetes clusters.
  • Expertise in cloud-native infrastructure across AWS, Azure, and GCP.
  • Proficient in automating Kubernetes cluster management.

Responsibilities

  • Design and scale Kubernetes infrastructure for AI workloads.
  • Automate fleet management and optimize multi-cloud deployments.
  • Collaborate with ML engineers to enhance resource allocation.

Skills

Kubernetes Management
Cloud-Native Infrastructure
Platform Engineering
Software Development
Infrastructure-as-Code
Distributed Systems

Education

BS in Computer Science
MS in Computer Science
PhD in Computer Science

Tools

Terraform
Ansible
Docker
Prometheus
Grafana

Job description

Join to apply for the Software Engineer - Cloud Engineering, Kubernetes role at Kumo

Join to apply for the Software Engineer - Cloud Engineering, Kubernetes role at Kumo

Get AI-powered advice on this job and more exclusive features.

The Cloud Infrastructure team at Kumo is responsible for managing and scaling our Kubernetes-based, cloud-native AI platform across multiple cloud providers. They set service level objectives, optimize resource allocation, enforce security compliance, and drive cost efficiency for the Multi-Cloud Platform.

As a key team member, you will architect and operate a highly scalable, resilient Kubernetes infrastructure to support massive Big Data and AI workloads. You’ll design and implement advanced cluster management strategies, fleet capacity scaling, optimize workload scheduling, and enhance observability at scale. Your expertise in Kubernetes internals, networking, and performance tuning will be critical in ensuring high availability and seamless scaling.

Joining early, you'll play a pivotal role in shaping platform reliability, automating infrastructure, and enabling ML engineers with efficient commit-to-production automation, Continuous Provisioning, CI/CD, ML Ops, and deployment orchestration and workflows. You'll collaborate with ML scientists, product engineers, and leadership to influence scaling strategies, develop self-service tooling, and drive multi-cloud resilience. Engineers at Kumo take ownership of core system design, building infrastructure that powers the next generation of AI applications.

Key Responsibilities

  • Design, build, and scale Kubernetes-based infrastructure to support Kumo’s multi-cloud AI platform, ensuring high availability, resilience, and performance.
  • Architect and optimize large-scale Kubernetes clusters, improving scheduling, networking (CNI), and workload orchestration for production environments.
  • Develop and extend Kubernetes controllers and operators to automate cluster management, lifecycle operations, and scaling strategies.
  • Enhance observability, diagnostics, and monitoring by building tools for real-time cluster health tracking, alerting, and performance tuning.
  • Lead efforts to automate fleet management, optimizing node pools, autoscaling, and multi-cluster deployments across AWS, GCP, and Azure.
  • Define and implement Kubernetes security policies, RBAC models, and best practices to ensure compliance and platform integrity.
  • Collaborate with ML engineers and platform teams to optimize Kubernetes for machine learning workloads, ensuring seamless resource allocation for AI/ML models.
  • Drive commit-to-production automation, cloud connectivity, and deployment orchestration, ensuring seamless application rollouts, zero-downtime upgrades, and global infrastructure reliability.

Required Skills And Experience

  • Kubernetes Mastery: 5-7+ years of experience managing large-scale Kubernetes clusters (EKS, GKE, AKS, or OpenSource) in production. Deep expertise in Kubernetes internals, including controllers, operators, scheduling, networking (CNI), and security policies.
  • Cloud-Native Infrastructure: 5-7+ years of experience building cloud-native Kubernetes-based infrastructure across AWS, Azure, and GCP.
  • Platform Engineering: 5-7+ years of experience building Kubernetes service meshes (Istio/Envoy, Traefik), networking policies (Calico/Tigera), and distributed ingress/egress control.
  • Fleet Management & Scaling: Proven experience in optimizing, scaling, and maintaining Kubernetes clusters across multi-cloud environments, ensuring high availability and performance.
  • Software Development: 5-7+ years of experience writing production-grade controllers and operators in Python, Go, or Rust to extend Kubernetes functionality.
  • Infrastructure-as-Code & Automation: Hands-on experience with Terraform, CloudFormation, Ansible, BASH and Make scripting to automate Kubernetes cluster provisioning and management.
  • Distributed Systems & SaaS: Expertise in building and operating large-scale distributed systems for cloud-native B2B SaaS applications running on Kubernetes.
  • Cloud Application Deployment: Deep expertise in building of container orchestration, workload scheduling, and runtime optimizations using Kubernetes, Argo or Flux.
  • Education: BS/MS in Computer Science or a related field (PhD preferred)

Nice to Have

  • Proficiency with cloud platforms such as AWS, GCP, or Azure.
  • Familiarity with chaos engineering tools and practices for testing system resilience.
  • Strong understanding of security best practices and compliance standards (GDPR, SOC2, ISO27001, vulnerability assessments, GRC, risk management).
  • Contributions to open-source projects, particularly in the Kubernetes or cloud-native ecosystem.
  • Expertise in Docker, Kubernetes, Jenkins, Flux, Argo, and Terraform in a Linux environment.
  • Hands-on experience with monitoring and observability tools such as Prometheus and Grafana.
  • Ability to develop customer-facing web frontends or public APIs/SDKs for platform services.

Benefits

  • Competitive salary and equity options.
  • Comprehensive medical and dental insurance.
  • An inclusive, diverse work environment where all employees are valued and supported.

We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

Seniority level
  • Seniority level
    Mid-Senior level
Employment type
  • Employment type
    Full-time
Job function
  • Job function
    Engineering and Information Technology
  • Industries
    Software Development

Referrals increase your chances of interviewing at Kumo by 2x

Sign in to set job alerts for “Software Engineer” roles.
Software Engineer 4 - TV & Web Player Platform
Software Engineer, AI Platform - New Grad
Software Engineer I (Full Time) United States

San Jose, CA $98,600.00-$125,900.00 2 weeks ago

Sunnyvale, CA $56.25-$173,000.00 2 weeks ago

Menlo Park, CA $56.25-$173,000.00 2 weeks ago

Menlo Park, CA $70.67-$208,000.00 2 weeks ago

Software Engineer III, Full Stack, Google Ads

Mountain View, CA $136,000.00-$200,000.00 2 weeks ago

(General Hire) Software Engineer Graduate (Advertisement Team) - 2025 Start (BS/MS)

San Jose, CA $113,500.00-$250,000.00 2 weeks ago

New Grads 2025 - Software Engineer, Algorithm

San Jose, CA $120,000.00-$165,000.00 7 months ago

New Grads 2025 - General Software Engineer

San Jose, CA $120,000.00-$165,000.00 3 months ago

San Jose, CA $113,400.00-$206,300.00 2 weeks ago

eCommerce Full Stack Developer (React / Shopify) - On Site
Software Engineer L4, Machine Learning Platform (Metaflow)
Frontend Software Engineer - University Graduate 2025

San Mateo, CA $120,000.00-$280,000.00 7 hours ago

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Software Engineer, Algorithms

Heartflow

San Francisco

Remote

USD 119,000 - 215,000

Yesterday
Be an early applicant

Backend Software Engineer/Senior Software Engineer

Salesforce, Inc..

San Francisco

Remote

USD 120,000 - 160,000

6 days ago
Be an early applicant

Sr Software Engineer, Cross-Platform Applications

Rollbar, Inc.

San Francisco

Remote

USD 100,000 - 160,000

4 days ago
Be an early applicant

Principal Frontend Software Engineer

Atlassian

San Francisco

Remote

USD 120,000 - 180,000

6 days ago
Be an early applicant

Senior Software Engineer, Fullstack

Mixpanel

San Francisco

Remote

USD 90,000 - 150,000

6 days ago
Be an early applicant

Software Engineer - OpenStack

Canonical

San Francisco

Remote

USD 100,000 - 720,000

14 days ago

Python and Kubernetes Software Engineer - Data, AI/ML & Analytics

Canonical

San Francisco

Remote

USD 100,000 - 720,000

14 days ago

Python and Kubernetes Software Engineer - Data, AI/ML & Analytics

Canonical

San Jose

Remote

USD 90,000 - 140,000

14 days ago

Staff Software Engineer- Cloud Platform

Calix

San Jose

Remote

USD 120,000 - 160,000

15 days ago