Enable job alerts via email!

Staff Site Reliability Engineer

Moveworks

Mountain View (CA)

On-site

USD 227,000 - 290,000

Full time

9 days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

Join a thriving AI startup as a Staff Site Reliability Engineer responsible for enhancing the performance and reliability of Moveworks' infrastructure. You will collaborate with teams to design scalable solutions and lead initiatives that drive operational excellence, ensuring high service levels and reliability. The role requires significant experience in distributed systems, cloud technologies, and operational design patterns, contributing to one of the fastest-growing technology companies in its field.

Qualifications

  • 7+ years of experience in operating complex distributed infrastructure.
  • Strong experience with AWS, GCP, or Azure.
  • Proficiency in Python, Golang, Java, or C++.

Responsibilities

  • Ensure health, performance, and capacity of infrastructure.
  • Design and implement operational efficiency solutions.
  • Participate in on-call rotation and drive discussions for outages.

Skills

Container orchestration
Cloud infrastructure
Unix/Linux proficiency
Distributed system design
Troubleshooting complex systems

Education

BS+ in computer science or a related field

Job description

Join to apply for the Staff Site Reliability Engineer role at Moveworks

Join to apply for the Staff Site Reliability Engineer role at Moveworks

Get AI-powered advice on this job and more exclusive features.

What You Will Do

As a site reliability engineer, you will be an owner of and be responsible for overall health, performance, and capacity of the Moveworks AI infrastructure and services. In addition to helping engineering teams with resolving operational issues, you will also design and implement solutions, tools and practices that help us improve operational efficiency and product SLA. This role is a blend of SRE, infrastructure, and software development.

What You Will Do

As a site reliability engineer, you will be an owner of and be responsible for overall health, performance, and capacity of the Moveworks AI infrastructure and services. In addition to helping engineering teams with resolving operational issues, you will also design and implement solutions, tools and practices that help us improve operational efficiency and product SLA. This role is a blend of SRE, infrastructure, and software development.

We’re building a team that indexes on moving fast, solving challenging product/engineering problems and providing value to our customers. To be successful, you'll be partnering with and enabling machine learning, search, product, data, and full stack teams to design and build fault tolerant and scalable infrastructure, services and features. This is an opportunity to play an integral role at the fastest-growing AI startup in its space.

  • Design, develop, and evolve site reliability and chaos engineering for Moveworks infrastructure and services.
  • Closely work with machine learning, search, product, infrastructure, data, and frontend teams to understand their infrastructure and operational needs and build solutions that are optimal, fault tolerant, and scalable.
  • Author and advocate for reliability through best distributed system design patterns (error handling, retries, rate limiting, circuit breaking, etc.). Participate in design discussions and ensure operational readiness of infrastructure, services, and features.
  • Design and build tools, libraries, and frameworks that allow engineering teams to rapidly deploy and scale Moveworks infrastructure and applications.
  • Review and participate in application performance analysis / tuning and capacity planning.
  • Setup and maintain monitoring, metrics, and reporting systems for observability and actionable alerting.
  • Define internal and customer-facing key SLA metrics, implement solutions and practices with different teams to improve those metrics.
  • Own the engineering on-call process and setup. Drive discussions for outages, root cause analysis, and action items.
  • Participate in on-call rotation for second-tier escalation (at Moveworks, each engineer participates in the team specific first-tier on-call rotation). Help diagnose and resolve complex operational issues.

What You Bring To The Table

  • 7+ years of experience in authoring and operating complex distributed infrastructure and applications
  • Strong experience with container orchestration platform like Kubernetes and cloud infrastructure like AWS / GCP / Azure
  • Very high proficiency with Unix/Linux, TCP/IP, DNS, load balancers, autoscaling, file systems and different types of data stores.
  • Software development proficiency with Python, Golang, Java, or C++
  • Experience working across teams and implementing solutions, tools, and practices to improve observability, reliability, and scalability
  • Desire to work at a startup pace in a small company with a high degree of ownership
  • Strong motivation, gumption, and an appetite for continuous, incremental changes and completing challenging projects fast
  • High level of curiosity about engineering outside of your immediate discipline and an incessant desire to learn
  • BS+ in computer science or a related field

Compensation Range: $227,000 - $290,000

  • Our compensation package includes a market competitive salary, equity for all full time roles, exceptional benefits, and, for applicable roles, commissions or bonus plans.

Ultimately, in determining pay, final offers may vary from the amount listed based on geography, the role’s scope and complexity, the candidate’s experience and expertise, and other factors.

Moveworks Is An Equal Opportunity Employer

  • Moveworks is proud to be an equal opportunity employer. We provide employment opportunities without regard to age, race, color, ancestry, national origin, religion, disability, sex, gender identity or expression, sexual orientation, veteran status, or any other characteristics protected by law.

Who We Are

Moveworks is an AI Assistant that helps all employees find information, automate tasks, and be more productive. We give the entire workforce one interface to get answers and take action across every enterprise system. And for developers, we make it easy to build and deploy AI agents that bring the power of Moveworks to every business process or workflow.

It’s all powered by a pioneering Reasoning Engine paired with an Agentic Automation Engine that, together, are able to handle even the most complex requests by understanding queries, then building and executing intelligent plans to fulfill them — in seconds.

Founded in 2016, Moveworks has raised $315M in funding, and eclipsed $100M in ARR in 2024 thanks to our award-winning product and team. Along the way, we’ve earned recognition as a leader in the Forrester Wave for Conversational AI Platforms for Employee Services, as a member of the Forbes Cloud 100 and AI 50 lists, and as one of America’s Most Loved Workplaces according to Newsweek.

Today, Moveworks has over 500 employees in six offices globally, and is backed by some of the world's most prominent investors including Kleiner Perkins, Lightspeed, Bain Capital Ventures, Sapphire Ventures, Iconiq, and more.

Over 350 leading organizations like Marriott, Databricks, Toyota, CVS Health, and Honeywell trust Moveworks to increase operational efficiency, enhance the employee experience, and drive lasting AI transformation.

Come join one of the most innovative teams on the planet!

Seniority level
  • Seniority level
    Mid-Senior level
Employment type
  • Employment type
    Full-time
Job function
  • Job function
    Engineering and Information Technology
  • Industries
    Software Development

Referrals increase your chances of interviewing at Moveworks by 2x

Sign in to set job alerts for “Site Reliability Engineer” roles.

Sunnyvale, CA $117,000.00-$173,000.00 1 week ago

Sunnyvale, CA $147,000.00-$208,000.00 1 week ago

Software Engineer, AI Platform - New Grad

Fremont, CA $147,000.00-$208,000.00 1 week ago

Site Reliability Engineer, AI/ML Platforms

Mountain View, CA $125,400.00-$188,100.00 1 week ago

Senior Software Engineer, AI/ML, YouTube
Cloud QA Automation Engineer Intern (Fall 2025)
Reliability Engineer, Chassis Systems, Semi
New Grads 2025 - General Software Engineer

San Jose, CA $120,000.00-$165,000.00 4 months ago

Principal Site Reliability Engineer (Wildfire Cloud Infrastructure)
Senior Software Engineer, AI/ML, YouTube

Foster City, CA $160,000.00-$190,000.00 3 months ago

Senior Site Reliability Engineer - remote
Systems Engineer - Series B startup - $250M+ in funding
Software Engineer Intern, Site Reliability Engineer

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Staff Site Reliability Engineer

Ipro Networks Pte. Ltd.

Palo Alto

Remote

USD 200.000 - 250.000

3 days ago
Be an early applicant

Staff Site Reliability Engineer - Kubernetes

Fivetran

Oakland

Hybrid

USD 186.000 - 234.000

5 days ago
Be an early applicant

Staff Site Reliability Engineer

Energy Vault

San Francisco

Hybrid

USD 180.000 - 250.000

4 days ago
Be an early applicant

Staff Site Reliability Engineer (Staff SRE) (Remote)

SailPoint

Remote

USD 129.000 - 240.000

23 days ago

Senior/Staff Site Reliability Engineer

Energy Vault

San Francisco

Hybrid

USD 183.000 - 250.000

4 days ago
Be an early applicant

Staff Site Reliability Engineer

Fivetran, Inc.

Oakland

Hybrid

USD 186.000 - 234.000

30+ days ago

Staff Functional Safety Engineer

Rivian

Palo Alto

On-site

USD 186.000 - 233.000

5 days ago
Be an early applicant

Staff Functional Safety Engineer

Davita Inc.

Palo Alto

On-site

USD 186.000 - 233.000

7 days ago
Be an early applicant

Staff Software Engineer, Site Reliability Engineer (SRE)

harvey.ai

San Francisco

On-site

USD 250.000 - 290.000

11 days ago