Enable job alerts via email!

Site Reliability Engineer Incident Manager

NetApp

United (PA)

Remote

USD 111,000 - 157,000

Full time

Today
Be an early applicant

Job summary

A technology company is seeking a Site Reliability Engineer Incident Manager to enhance uptime and recovery in a complex multi-cloud environment. Responsibilities include managing incidents, leading process improvements, and communicating effectively with stakeholders. The ideal candidate has at least 2 years of experience and a bachelor's degree. Competitive salary and comprehensive benefits are offered.

Benefits

Medical insurance
Dental insurance
Educational assistance
401(k) plans

Qualifications

  • Minimum 2 years of experience in incident management.
  • Experience in large-scale environments is preferred.
  • Outstanding communication and presentation skills.

Responsibilities

  • Define and refine incident management workflows.
  • Manage proactive notification for planned events.
  • Lead after action reviews and root cause analyses.

Skills

Incident management
Change management
Problem management
Communication skills
DevOps
Cloud platforms (AWS/Azure/Google Cloud)

Education

Bachelor's degree or equivalent work experience
Job description
Job Summary

Site Reliability Engineer Incident Manager at NetApp. Remote - US. This role focuses on uptime and Time to Recovery (TTR) in a multi-region, multi-cloud environment within the Public Cloud Service group. You will collaborate with multiple teams to rapidly mitigate incidents, lead by example, and contribute actively.

Responsibilities
  • Define and refine incident management, change management and problem management-related workflows.
  • Reduce operational inefficiencies in the incident management process through automation and continuous process improvement. Identify escalation needs and trigger escalation accordingly.
  • Manage proactive notification for planned events and communications during critical incidents. Produce written and verbal updates for customers, partners, and senior leadership.
  • Create and maintain recovery playbooks for common customer patterns and issues.
  • Improve alert coverage and accuracy to drive down resolution times. Promote self-service tools and documentation to deflect incidents.
  • Lead after action reviews and root cause analyses. Complete timely postmortems to identify repair items and drive process/product improvements.
  • Present monthly incident availability and operability metrics to cross-functional leadership. Build dashboards for visibility into key business metrics.
  • Ability to work outside of normal business hours (weekends, holidays, evenings) as needed.
Education & Experience
  • Typically requires a minimum 2 years of related experience with a bachelor’s degree; or 2 years and a master’s degree; or equivalent work experience.
  • Experience managing incidents and running incident management programs, preferably in large-scale environments.
  • Experience working with service owners leading a DevOps team in public cloud platforms such as AWS, Azure, or Google Cloud is a big plus.
  • A basic understanding of public cloud vendors such as AWS, Azure, Google Cloud, or others.
  • Outstanding communication and presentation skills, written and verbal. Strong empathy and listening skills.
  • Proven ability to solve problems, extract meaningful information, and take decisive action.
Equal Opportunity Employer

NetApp is committed to Equal Employment Opportunity (EEO) and to compliance with all federal, state and local laws that prohibit employment discrimination based on age, race, color, gender, sexual orientation, gender identity, national origin, religion, disability or genetic information, pregnancy, protected veteran status, and any other protected classification.

About NetApp

We are forward-thinking technology people with heart. We embrace diversity and openness, collaborate with smart people, and push limits to reward great ideas. We value a healthy work-life balance and offer benefits including medical, dental, wellness, vision plans, educational assistance, legal services, and financial savings programs. If you run toward knowledge and problem-solving, NetApp could be the right place for you.

Notes

The base salary/hiring wage range for this position in USA/Canada is $111,300 - $156,500. Final compensation will depend on candidate location, qualifications, experience, and other factors. This role may include benefits such as PTO, medical plans, 401(k), and other relevant programs, as permitted by law.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.