Enable job alerts via email!

Software Engineer, Infrastructure

Anyscale

San Francisco (CA)

On-site

USD 100,000 - 160,000

Full time

Yesterday
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

Join a forward-thinking company dedicated to democratizing distributed computing. As a Software Engineer, you'll work on cutting-edge infrastructure that simplifies the development of distributed AI applications. You'll design and optimize systems that manage Ray clusters, ensuring high performance and reliability. Collaborate with leading experts in the field while contributing to open-source projects and proprietary products. This is an exciting opportunity to make a significant impact in a rapidly evolving tech landscape, where your skills will help shape the future of AI infrastructure.

Qualifications

  • 3+ years of experience writing high-quality production code.
  • Hands-on experience with scalable distributed systems.
  • Deep understanding of networking and security in cloud environments.

Responsibilities

  • Design and build services that orchestrate Ray clusters across environments.
  • Optimize control plane components for distributed AI/ML workloads.
  • Collaborate with experts to enhance AI infrastructure.

Skills

Go
Python
Kubernetes
AWS
Azure
GCP
Distributed Systems
Networking
Security
Observability (Prometheus, Grafana)

Education

Bachelor's degree in Computer Science

Tools

Kubernetes
AWS
Azure
GCP
Prometheus
Grafana

Job description

About Anyscale:

At Anyscale, we're on a mission to democratize distributed computing and make it accessible to software developers of all skill levels. We’re commercializing Ray, a popular open-source project that's creating an ecosystem of libraries for scalable machine learning. Companies like OpenAI, Uber, Spotify, Instacart, Cruise, and many more, have Ray in their tech stacks to accelerate the progress of AI applications out into the real world.

With Anyscale, we’re building the best place to run Ray, so that any developer or data scientist can scale an ML application from their laptop to the cluster without needing to be a distributed systems expert.

Proud to be backed by Andreessen Horowitz, NEA, and Addition with $250+ million raised to date.

About the role

Anyscale is looking for a Software Engineer to join the Infrastructure team. Anyscale aims to provide the next generation of tools and infrastructure to make developing and running distributed AI applications in the cloud as easy as on your laptop. As part of the Infra team, we build the scalable, secure, and robust backbone that enables this vision.

Our team is responsible for both the control plane, which orchestrates cluster management, scheduling, and user access, and the data plane, which ensures high-performance execution of distributed workloads.

We are seeking a talented Software Engineer with a strong background in control plane and data plane development, along with expertise in Kubernetes, container orchestration, and cloud-native infrastructure. You will play a crucial role in designing, implementing, and optimizing the critical infrastructure that powers Anyscale’s cloud platform.

You will have the opportunity to work on open-source Ray, contribute to our infinite laptop proprietary product, and develop seamless integration between the two, while also delivering high-impact features for our customers.

A snapshot of projects you may work on

  • Design, build, and scale services that orchestrate Ray clusters across cloud and on-prem environments, supporting both VM-based and Kubernetes-based deployments

  • Optimize control plane components for large-scale, distributed AI/ML workloads

  • Build intelligent scheduling and resource management systems for heterogeneous compute clusters

  • Develop features to enhance the reliability, performance, scalability, and observability of Anyscale-managed Ray workloads

  • Support and optimize accelerator integration (e.g., GPUs, TPUs).

  • Handle container image management and dependency resolution for distributed workloads

  • Participate in code reviews, design and architecture discussions

  • Provide on-call support, working closely with customer and field teams to troubleshoot infrastructure issues

  • Collaborate with leading distributed systems and machine learning experts to push the boundaries of AI infrastructure

We'd love to hear from you if have

  • Bachelor's degree in Computer Science, Engineering, or equivalent practical experience

  • 3+ years of experience writing high-quality production code

  • Hands-on experience in building and maintaining highly available, scalable, and performant distributed system

  • Expertise in cloud-native technologies (AWS, Azure, GCP) and Kubernetes-based deployments

  • Deep understanding of networking, security, and authentication mechanisms in cloud environment

  • Familiarity with observability stacks (Prometheus, Grafana etc)

  • Proficiency in Go and Python

  • Knowledge of low-level operating system foundations (Linux kernel, file systems, containers)

Anyscale Inc. is an Equal Opportunity Employer. Candidates are evaluated without regard to age, race, color, religion, sex, disability, national origin, sexual orientation, veteran status, or any other characteristic protected by federal or state law.

Anyscale Inc. is an E-Verify company and you may review the Notice of E-Verify Participation and the Right to Work posters in English and Spanish

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Sotware Engineer - Infrastructure & SRE

Syndicate

Remote

USD 80,000 - 130,000

-1 days ago
Be an early applicant

Software Engineer - Infrastructure Observability Monitoring (Remote USA)

Splunk

Washington

Remote

USD 131,000 - 182,000

6 days ago
Be an early applicant

Software Engineer, Infrastructure - 2025

Slash

San Francisco

On-site

USD 90,000 - 150,000

Yesterday
Be an early applicant

Software Engineer - Infrastructure Observability Monitoring (Remote USA)

SPLUNK SERVICES UK LIMITED

Indiana

Remote

USD 117,000 - 162,000

10 days ago

Software Engineer - Infrastructure Observability Monitoring (Remote USA)

Splunk

Nevada

Remote

USD 117,000 - 162,000

13 days ago

Software Engineer - Infrastructure Observability Monitoring (Remote USA)

Splunk

Oregon

Remote

USD 117,000 - 202,000

12 days ago

_SOFTWARE ENGINEER: INFRASTRUCTURE

INTELLISWIFT INC

Austin

Remote

USD 80,000 - 120,000

13 days ago

Software Engineer Infrastructure

Gauntlet

Remote

USD 80,000 - 150,000

14 days ago

Software Engineer, Infrastructure Automation

Cloudflare, Inc.

San Francisco

On-site

USD 100,000 - 160,000

8 days ago