Enable job alerts via email!

Sr Manager, Site Reliability Engineering

Intelliswift - An LTTS Company

Chicago (IL)

On-site

USD 150,000 - 200,000

Full time

5 days ago

Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading company, Intelliswift - An LTTS Company, is looking for a Senior Manager of Site Reliability Engineering in Chicago. This role entails guiding a dynamic team to ensure the reliability of critical business applications while collaborating seamlessly with various stakeholders. Candidates should possess strong leadership capabilities, extensive experience in Site Reliability Engineering, and proficiency in modern DevOps practices and tools.

Qualifications

7+ years of IT and business/industry work experience.
5+ years of Site Reliability Engineering experience.
Strong communication and relationship management skills.

Responsibilities

Guide a team for the instrumentation and analysis of vital business applications.
Develop and mentor the Site Reliability Engineering team.
Ensure system reliability and performance through best practices.

Skills

Leadership

Problem-solving

Communication

Collaboration

Education

Bachelor's degree in Information Technology or relevant field

Tools

GitHub

Jenkins

Dynatrace

AWS Cloudwatch

Sr Manager, Site Reliability Engineering

3 days ago Be among the first 25 applicants

Get AI-powered advice on this job and more exclusive features.

Direct message the job poster from Intelliswift - An LTTS Company

EC2 (Elastic Compute Cloud), S3 (Simple Storage Service), RDS (Relational Database Service), VPC (Virtual Private Cloud), Lambda, and CloudFormation

ITSM/ITIL

Job overview and responsibilities

As the Senior Manager of Site Reliability Engineering, you are responsible for guiding a team dedicated to the instrumentation and analysis of vital business applications, ensuring their availability, and contributing to major incident resolution and root cause analysis. You hold accountability for devising the strategy, as well as the assessment, deployment, and management of IT operations tools and methodologies. Your leadership role involves steering technical experts who specialize in evaluating enterprise reliability and enhancing system efficiency. Furthermore, you are tasked with forging and upholding robust connections with digital technology and business executives at all tiers, leveraging your profound technical knowledge and outstanding leadership and analytical abilities to lead your team towards creating highly available applications, adhering to best practices, and promoting system optimization based on empirical evidence in partnership with development teams by leveraging modern DevOps practices.

Design, Develop & Drive Outcomes:
Understand the potential impact of system requirements and design choices across multiple cloud and on-premise technologies
Embrace the role of developing and mentoring the Site Reliability Engineering team, fostering expertise in this critical area
Guide the team to devise solutions that not only meet long-term objectives but also effectively address urgent technical debts
Position yourself as a prominent thought leader in Site Reliability Engineering Principles, influencing others through your knowledge and experience
Regularly disseminate best practices and champion process improvements, both within your team and in collaboration with other teams, to drive collective success
Program Management & Delivery:
Track the team’s progress on projects and key performance indicators, while also offering concrete, actionable suggestions for further enhancing or influencing product or project delivery
Encourage cross-functional collaboration and gather input from technology teams to promote ongoing program enhancement
Regularly provide insights on critical Site Reliability Engineering metrics to showcase the program’s achievements and identify potential areas for improvement
Keep an updated collection of materials to communicate the current status, including progress, obstacles, opportunities, and the program’s strategic direction to Digital Technology leaders
Effectively manage both internal and external relationships to foster and sustain beneficial strategic partnerships, thereby advancing the success of the Site Reliability Engineering Program Develop and roll out training initiatives to ensure that partners are well-equipped to fully utilize Observability programs
Oversee the 24/7 command center teams, ensuring they are adept at early detection, triage, and recovery for all applications and services, which contributes to a reduced mean time to recovery
Talent Management and People Development:
Initiate and facilitate the performance assessment process for your team, fostering an environment that encourages individuals at all performance tiers to excel
Establish and nurture relationships with team members to create a foundation of trust, recognizing areas where technical or analytical skills are lacking, devising strategies for improvement Regularly encourage team members to exchange expertise about Site Reliability Engineering practices and embrace new technologies
Lead and inspire teams to tackle intricate challenges and champion the use of open-source technologies and solutions
Organizational Effectiveness / People:
Possessing robust technical expertise and leadership qualities as you lead by example with a proven track record in Site Reliability Engineering
Your proficiency in driving the creation of multi-cloud infrastructure serves as a benchmark and motivates the team of developers and infrastructure engineers
Collaborate with your engineers to manage project dependencies, adeptly negotiate and plan for incremental delivery milestones with stakeholders, and achieve on-time project completion
Work closely with product teams to understand and address their performance and resilience concerns, and formulate sustainable strategies to resolve persistent challenges
Engineering Excellence and Practices:
Continuously work on enhancing the reliability, stability, and performance of our digital platforms, being at the forefront of promoting engineering excellence, implementing best practices, and overseeing the integration of fully automated telemetry within modern DevOps frameworks
Your work in advancing problem detection and service restoration processes is pivotal
Utilizing cutting-edge Site Reliability Engineering methods, coupled with automated alerting and self-healing mechanisms, you are instrumental in improving both cloud-based and on-premises systems, thereby fortifying our digital infrastructure’s robustness and efficiency

Qualifications

What’s needed to succeed (Minimum Qualifications):

Bachelor's degree in information technology, Business Administration, Computer Science or relevant field
7+ years of IT and business/industry work experience
5+ years of Site Reliability Engineering experience working with telemetry, observability, self-healing solutions, and platform automation
+5 years of experience leading projects and managing people
2 - 3 years of leadership experience in managing cross-functional teams or projects, and influencing senior level management and key stakeholders
2+ years of experience with leading DevOps practices and tools (CI/CD pipelines, Jenkins, GitHub)
Recognized expertise in field - in industry and/or within United
Proven expertise in leading and influencing technical staff or coordinating work across multiple technology teams
Proven experience with monitoring, logging and telemetry tools like Dynatrace, Splunk, Prometheus, AWS Cloudwatch, etc.
Proficiency with DevOps practices and tools (CI/CD pipelines, Jenkins, GitHub)
Ability to diagnose and troubleshoot issues effectively
Strong and effective communication skills and status reporting
Experience with AWS networking services like VPC, Route 53, and CloudFront, with understanding of cloud concepts like IaaS, PaaS, and SaaS
Experience with distributed storage technologies such as EC2 (Elastic Compute Cloud), S3 (Simple Storage Service), RDS (Relational Database Service), VPC (Virtual Private Cloud), Lambda, and CloudFormation
Experience in developing monitoring tools and log analysis tools to manage operations
Dynatrace Associate Certification or AWS Certified DevOps Engineer is a plus
Must be legally authorized to work in the United States for any employer without sponsorship
Successful completion of interview required to meet job qualification
Reliable, punctual attendance is an essential function of the position

Seniority level

Seniority level
Mid-Senior level

Employment type

Employment type
Contract

Job function

Job function
Information Technology
Industries
IT Services and IT Consulting, Airlines and Aviation, and Aviation and Aerospace Component Manufacturing

Referrals increase your chances of interviewing at Intelliswift - An LTTS Company by 2x

Get notified about new Reliability Engineering Manager jobs in Chicago, IL.

Wheeling, IL $135,000.00-$150,000.00 1 week ago

Wheeling, IL $130,000.00-$150,000.00 1 day ago

Chicago, IL $202,000.00-$277,000.00 6 days ago

Engineering Technical Manager (Salary Range $127,800-166,200) Job

Engineering Manager - Gear Manufacturing

Oak Brook, IL $100,000.00-$140,000.00 2 weeks ago

Chicago, IL $101,915.00-$119,900.00 6 days ago

Engineering Manager - Onboarding Experience

Chicago, IL $79,000.00-$132,000.00 3 weeks ago

Chicago, IL $135,000.00-$160,000.00 4 months ago

Senior Manager, Mechanical Engineering - Locomotives- EN

Homewood, IL $127,000.00-$167,000.00 1 week ago

Chicago, IL $106,300.00-$146,200.00 5 days ago

Chicago, IL $150,000.00-$200,000.00 3 weeks ago

Chicago, IL $129,800.00-$165,490.00 6 days ago

Engineering Manager, Merchant Experiences (Stripe Dashboard)

Chicago, IL $214,500.00-$321,800.00 3 days ago

Power Utility Distribution Engineering Manager

Chicago, IL $160,000.00-$180,000.00 21 hours ago

Remote Engineering Manager - $170-$190k (Wearable Med Device)

Chicago, IL $170,000.00-$190,000.00 5 days ago

Greater Chicago Area $120,000.00-$160,000.00 3 days ago

Senior Manager, Mechanical Engineering - Locomotives- FR

Manager, Renewable Engineering (Energy Storage)

Chicago, IL $125,000.00-$150,000.00 2 weeks ago

Chicago, IL $143,000.00-$190,000.00 1 day ago

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Senior Manager, Site Reliability Engineering

Precisely

Remote

USD 120,000 - 180,000

Today

Be an early applicant

Senior Manager, Site Reliability Engineering

Centene

Remote

USD 119,000 - 221,000

3 days ago

Be an early applicant

Senior Manager Site Reliability Engineering (Kubernetes)- Remote

Akamai Technologies

Remote

USD 155,000 - 324,000

30+ days ago