Enable job alerts via email!

Sr Manager, Site Reliability Engineering

United Airlines

Chicago (IL)

On-site

USD 137,000 - 187,000

Full time

30+ days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

United Airlines is seeking a Senior Manager of Site Reliability Engineering to lead a team focused on ensuring the reliability of business applications and optimizing system efficiency. This role involves strategic leadership, program management, and fostering cross-functional collaboration to enhance service delivery and operational excellence.

Benefits

Medical, dental, vision, life, accident & disability insurance

Parental leave

Employee assistance program

Paid holidays and time off

401(k) plan

Flight privileges

Qualifications

7+ years of IT and business/industry work experience.
5+ years of Site Reliability Engineering experience.
2+ years of experience with leading DevOps practices.

Responsibilities

Guide a team dedicated to the instrumentation and analysis of vital business applications.
Oversee the 24/7 command center teams for early detection and recovery.
Track the team’s progress on projects and key performance indicators.

Skills

Leadership

Problem Solving

Communication

Project Management

Technical Expertise

Education

Bachelor's degree in information technology, Business Administration, Computer Science or relevant field

Tools

Dynatrace

Splunk

Prometheus

AWS Cloudwatch

CI/CD pipelines

Jenkins

GitHub

Achieving our goals starts with supporting yours. Grow your career, access top-tier health and wellness benefits, build lasting connections with your team and our customers, and travel the world using our extensive route network.

Come join us to create what’s next. Let’s define tomorrow, together.

Description

Job overview and responsibilities

As the Senior Manager of Site Reliability Engineering, you are responsible for guiding a team dedicated to the instrumentation and analysis of vital business applications, ensuring their availability, and contributing to major incident resolution and root cause analysis. You hold accountability for devising the strategy, as well as the assessment, deployment, and management of IT operations tools and methodologies. Your leadership role involves steering technical experts who specialize in evaluating enterprise reliability and enhancing system efficiency. Furthermore, you are tasked with forging and upholding robust connections with digital technology and business executives at all tiers, leveraging your profound technical knowledge and outstanding leadership and analytical abilities to lead your team towards creating highly available applications, adhering to best practices, and promoting system optimization based on empirical evidence in partnership with development teams by leveraging modern DevOps practices.

Design, Develop & Drive Outcomes:
- Understand the potential impact of system requirements and design choices across multiple cloud and on-premise technologies
- Embrace the role of developing and mentoring the Site Reliability Engineering team, fostering expertise in this critical area
- Guide the team to devise solutions that not only meet long-term objectives but also effectively address urgent technical debts
- Position yourself as a prominent thought leader in Site Reliability Engineering Principles, influencing others through your knowledge and experience
- Regularly disseminate best practices and champion process improvements, both within your team and in collaboration with other teams, to drive collective success
Program Management & Delivery:
- Track the team’s progress on projects and key performance indicators, while also offering concrete, actionable suggestions for further enhancing or influencing product or project delivery
- Encourage cross-functional collaboration and gather input from technology teams to promote ongoing program enhancement
- Regularly provide insights on critical Site Reliability Engineering metrics to showcase the program’s achievements and identify potential areas for improvement
- Keep an updated collection of materials to communicate the current status, including progress, obstacles, opportunities, and the program’s strategic direction to Digital Technology leaders
- Effectively manage both internal and external relationships to foster and sustain beneficial strategic partnerships, thereby advancing the success of the Site Reliability Engineering Program Develop and roll out training initiatives to ensure that partners are well-equipped to fully utilize Observability programs
- Oversee the 24/7 command center teams, ensuring they are adept at early detection, triage, and recovery for all applications and services, which contributes to a reduced mean time to recovery
Talent Management and People Development:
- Initiate and facilitate the performance assessment process for your team, fostering an environment that encourages individuals at all performance tiers to excel
- Establish and nurture relationships with team members to create a foundation of trust, recognizing areas where technical or analytical skills are lacking, devising strategies for improvement Regularly encourage team members to exchange expertise about Site Reliability Engineering practices and embrace new technologies
- Lead and inspire teams to tackle intricate challenges and champion the use of open-source technologies and solutions
Organizational Effectiveness / People:
- Possessing robust technical expertise and leadership qualities as you lead by example with a proven track record in Site Reliability Engineering
- Your proficiency in driving the creation of multi-cloud infrastructure serves as a benchmark and motivates the team of developers and infrastructure engineers
- Collaborate with your engineers to manage project dependencies, adeptly negotiate and plan for incremental delivery milestones with stakeholders, and achieve on-time project completion
- Work closely with product teams to understand and address their performance and resilience concerns, and formulate sustainable strategies to resolve persistent challenges
Engineering Excellence and Practices:
- Continuously work on enhancing the reliability, stability, and performance of our digital platforms, being at the forefront of promoting engineering excellence, implementing best practices, and overseeing the integration of fully automated telemetry within modern DevOps frameworks
- Your work in advancing problem detection and service restoration processes is pivotal
- Utilizing cutting-edge Site Reliability Engineering methods, coupled with automated alerting and self-healing mechanisms, you are instrumental in improving both cloud-based and on-premises systems, thereby fortifying our digital infrastructure’s robustness and efficiency

Qualifications

What’s needed to succeed (Minimum Qualifications):
Bachelor's degree in information technology, Business Administration, Computer Science or relevant field
7+ years of IT and business/industry work experience
5+ years of Site Reliability Engineering experience working with telemetry, observability, self-healing solutions, and platform automation
+5 years of experience leading projects and managing people
2 - 3 years of leadership experience in managing cross-functional teams or projects, and influencing senior level management and key stakeholders
2+ years of experience with leading DevOps practices and tools (CI/CD pipelines, Jenkins, GitHub)
Recognized expertise in field - in industry and/or within United
Proven expertise in leading and influencing technical staff or coordinating work across multiple technology teams
Proven experience with monitoring, logging and telemetry tools like Dynatrace, Splunk, Prometheus, AWS Cloudwatch, etc.
Proficiency with DevOps practices and tools (CI/CD pipelines, Jenkins, GitHub)
Ability to diagnose and troubleshoot issues effectively
Strong and effective communication skills and status reporting
Experience with AWS networking services like VPC, Route 53, and CloudFront, with understanding of cloud concepts like IaaS, PaaS, and SaaS
Experience with distributed storage technologies such as EC2 (Elastic Compute Cloud), S3 (Simple Storage Service), RDS (Relational Database Service), VPC (Virtual Private Cloud), Lambda, and CloudFormation
Experience in developing monitoring tools and log analysis tools to manage operations
Experience in one or more general purpose programming languages: Python, JavaScript, shell scripting (Unix/Linux)
Dynatrace Associate Certification or AWS Certified DevOps Engineer is a plus
Must be legally authorized to work in the United States for any employer without sponsorship
Successful completion of interview required to meet job qualification
Reliable, punctual attendance is an essential function of the position

The base pay range for this role is $137,275.00 to $187,000.00.
The base salary range/hourly rate listed is dependent on job-related, non-discriminatory factors such as experience, education, and skills. This position is also eligible for bonus and/or long-term incentive compensation awards.

You may be eligible for the following competitive benefits: medical, dental, vision, life, accident & disability, parental leave, employee assistance program, commuter, paid holidays, paid time off, 401(k) and flight privileges.

United Airlines is an equal opportunity employer. United Airlines recruits, employs, trains, compensates and promotes regardless of race, religion, color, national origin, gender identity, sexual orientation, physical ability, age, veteran status and other protected status as required by applicable law. Equal Opportunity Employer - Minorities/Women/Veterans/Disabled/LGBT.

We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform crucial job functions. Please contact JobAccommodations@united.com to request accommodation.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Senior Manager, Site Reliability Engineering

Precisely

Remote

USD 120,000 - 180,000

Today

Be an early applicant