Enable job alerts via email!

Technical Duty Officer / Sr. Site Reliability Engineer

Xero

United States

Remote

USD 185,000 - 230,000

Full time

7 days ago
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

Join a forward-thinking company as a Site Reliability Engineer, where you will enhance incident management processes and lead critical responses to outages. This role emphasizes collaboration with product teams to improve service scalability and operational efficiency. You will be at the forefront of transforming the SRE culture, advocating best practices, and ensuring high reliability across all products. With a commitment to diversity and inclusion, this innovative firm offers a supportive environment for all employees. Embrace a career where your contributions make a significant impact on the business and its customers.

Benefits

Generous paid leave
Mental health support
401k matching
Parental leave
Employee share plans
Flexible work arrangements
Career development programs

Qualifications

  • 5+ years of experience in Site Reliability Engineering or related roles.
  • Hands-on troubleshooting with AWS and networking issues.

Responsibilities

  • Own and refine incident management processes for reliability.
  • Lead during outages, coordinating teams for quick resolutions.
  • Develop scalable processes and observability strategies.

Skills

Site Reliability Engineering
AWS Services
Networking Issues (TCP/IP, SSL/TLS, DNSSEC, IPsec, BGP)
Python Programming
Incident Management
Communication Skills

Job description

Our Purpose

At Xero, we’re here to help you supercharge your business. We do this by automating routine tasks, surfacing actionable insights, and connecting businesses with the right data, advisors, and apps. When that happens, we’re not only making life better for small businesses, but also building a stronger economy that can change the world.

We aim to make running a business beautiful by making small businesses more efficient daily, connecting them with advanced technology, and empowering a supportive community. This potential is limitless, and through these efforts, we contribute to a better economy and a better world.

How You'll Make an Impact
  • As Xero grows, maintaining high reliability is essential to meet customer expectations. Our Incident and Problem Management team, part of the Site Reliability Engineering (SRE) organization, is responsible for building, delivering, and maintaining robust incident management processes and tools. They drive enduring reliability through swift responses to high-severity incidents and ensure process maturity aligns with business growth.
  • We seek an experienced SRE professional with a strong technical background, deep SRE expertise, and a passion for developing resilient processes. The candidate should have extensive experience leading responses to major cloud issues, driving best practices, and transforming the SRE culture at Xero. Excellent communication skills are essential for leading technical discussions and tracking incident-related actions.
What You'll Do:
  • Own and refine the incident management process to ensure ongoing reliability across all Xero products and services.
  • Lead during critical outages, coordinating multiple teams for quick decision-making and resolution.
  • Promote and lead the transformation towards a world-class SRE organization, advocating SRE principles within Engineering.
  • Address global customer environment issues with a customer-focused approach, fostering continuous learning and technical excellence within the team.
  • Develop scalable processes and observability strategies for rapid diagnosis, response, and reliability.
  • Collaborate with product teams to analyze failures, applying insights to improve service scalability and operational efficiency.
  • Provide training to ensure process understanding and adherence, including incident commander training for lower-priority issues.
  • Conduct proactive deep dives into incidents to identify and mitigate future risks, and develop playbooks and automated responses for business continuity and disaster recovery scenarios.
What You'll Bring:
  • 5+ years of experience as a Site Reliability Engineer or in a related Operations/Engineering role.
  • Hands-on troubleshooting experience with AWS services and networking issues (TCP/IP, SSL/TLS, DNSSEC, IPsec, BGP).
  • Proficiency in coding (preferably Python) for automation, scripting, and tool development. Strong communication skills, capable of translating technical issues into actionable steps.
$185,000 - $230,000 per year

Why Xero?

We value diversity of thought and foster a human-first culture rooted in respect, fairness, and inclusion. Our benefits include generous paid leave, mental health support, employee resource groups, wellbeing programs, comprehensive insurance, family support, 401k matching, parental leave, employee share plans, modern offices, flexible work arrangements, career development, and more. Join us to do your best work at Xero.

We encourage candidates from underrepresented groups to apply, even if their experience doesn't align perfectly with every requirement. If you need support or accommodations during the application process, please let us know.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Senior Site Reliability Engineer

Censys, Inc.

Ann Arbor

Remote

USD 145,000 - 195,000

Today
Be an early applicant

FlightAware- Sr. Site Reliability Engineer (Remote)

Lensa

Austin

Remote

USD 101,000 - 203,000

2 days ago
Be an early applicant

FlightAware- Sr. Site Reliability Engineer (Remote)

Pratt & Whitney

Remote

USD 101,000 - 203,000

4 days ago
Be an early applicant

Senior Machine Learning Engineer, Safety

Reddit, Inc.

Remote

USD 216,000 - 304,000

9 days ago

SENIOR SITE RELIABILITY ENGINEERS

Atlassian

San Francisco

Remote

USD 180,000 - 231,000

8 days ago

Senior Back End Engineer, Platform New York (Remote)

You.ai

New York

Remote

USD 150,000 - 270,000

Yesterday
Be an early applicant

Senior Back End Engineer, Platform San Francisco (Remote)

You.ai

San Francisco

Remote

USD 150,000 - 270,000

Yesterday
Be an early applicant

Staff Data Platform Engineer - (Remote - US)

Jobgether

Remote

USD 170,000 - 720,000

2 days ago
Be an early applicant

Principal Site Reliability Engineer

Lumen Argentina

Aurora

Remote

USD 156,000 - 209,000

Today
Be an early applicant