Enable job alerts via email!

Sr Manager, Site Reliability Engineering

Intelliswift - An LTTS Company

Chicago (IL)

On-site

USD 150,000 - 200,000

Full time

5 days ago
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading company, Intelliswift - An LTTS Company, is looking for a Senior Manager of Site Reliability Engineering in Chicago. This role entails guiding a dynamic team to ensure the reliability of critical business applications while collaborating seamlessly with various stakeholders. Candidates should possess strong leadership capabilities, extensive experience in Site Reliability Engineering, and proficiency in modern DevOps practices and tools.

Qualifications

  • 7+ years of IT and business/industry work experience.
  • 5+ years of Site Reliability Engineering experience.
  • Strong communication and relationship management skills.

Responsibilities

  • Guide a team for the instrumentation and analysis of vital business applications.
  • Develop and mentor the Site Reliability Engineering team.
  • Ensure system reliability and performance through best practices.

Skills

Leadership
Problem-solving
Communication
Collaboration

Education

Bachelor's degree in Information Technology or relevant field

Tools

GitHub
Jenkins
Dynatrace
AWS Cloudwatch

Job description

Sr Manager, Site Reliability Engineering
Sr Manager, Site Reliability Engineering

3 days ago Be among the first 25 applicants

Get AI-powered advice on this job and more exclusive features.

Direct message the job poster from Intelliswift - An LTTS Company

  • EC2 (Elastic Compute Cloud), S3 (Simple Storage Service), RDS (Relational Database Service), VPC (Virtual Private Cloud), Lambda, and CloudFormation

ITSM/ITIL

Job overview and responsibilities

As the Senior Manager of Site Reliability Engineering, you are responsible for guiding a team dedicated to the instrumentation and analysis of vital business applications, ensuring their availability, and contributing to major incident resolution and root cause analysis. You hold accountability for devising the strategy, as well as the assessment, deployment, and management of IT operations tools and methodologies. Your leadership role involves steering technical experts who specialize in evaluating enterprise reliability and enhancing system efficiency. Furthermore, you are tasked with forging and upholding robust connections with digital technology and business executives at all tiers, leveraging your profound technical knowledge and outstanding leadership and analytical abilities to lead your team towards creating highly available applications, adhering to best practices, and promoting system optimization based on empirical evidence in partnership with development teams by leveraging modern DevOps practices.

  • Design, Develop & Drive Outcomes:
  • Understand the potential impact of system requirements and design choices across multiple cloud and on-premise technologies
  • Embrace the role of developing and mentoring the Site Reliability Engineering team, fostering expertise in this critical area
  • Guide the team to devise solutions that not only meet long-term objectives but also effectively address urgent technical debts
  • Position yourself as a prominent thought leader in Site Reliability Engineering Principles, influencing others through your knowledge and experience
  • Regularly disseminate best practices and champion process improvements, both within your team and in collaboration with other teams, to drive collective success
  • Program Management & Delivery:
  • Track the team’s progress on projects and key performance indicators, while also offering concrete, actionable suggestions for further enhancing or influencing product or project delivery
  • Encourage cross-functional collaboration and gather input from technology teams to promote ongoing program enhancement
  • Regularly provide insights on critical Site Reliability Engineering metrics to showcase the program’s achievements and identify potential areas for improvement
  • Keep an updated collection of materials to communicate the current status, including progress, obstacles, opportunities, and the program’s strategic direction to Digital Technology leaders
  • Effectively manage both internal and external relationships to foster and sustain beneficial strategic partnerships, thereby advancing the success of the Site Reliability Engineering Program Develop and roll out training initiatives to ensure that partners are well-equipped to fully utilize Observability programs
  • Oversee the 24/7 command center teams, ensuring they are adept at early detection, triage, and recovery for all applications and services, which contributes to a reduced mean time to recovery
  • Talent Management and People Development:
  • Initiate and facilitate the performance assessment process for your team, fostering an environment that encourages individuals at all performance tiers to excel
  • Establish and nurture relationships with team members to create a foundation of trust, recognizing areas where technical or analytical skills are lacking, devising strategies for improvement Regularly encourage team members to exchange expertise about Site Reliability Engineering practices and embrace new technologies
  • Lead and inspire teams to tackle intricate challenges and champion the use of open-source technologies and solutions
  • Organizational Effectiveness / People:
  • Possessing robust technical expertise and leadership qualities as you lead by example with a proven track record in Site Reliability Engineering
  • Your proficiency in driving the creation of multi-cloud infrastructure serves as a benchmark and motivates the team of developers and infrastructure engineers
  • Collaborate with your engineers to manage project dependencies, adeptly negotiate and plan for incremental delivery milestones with stakeholders, and achieve on-time project completion
  • Work closely with product teams to understand and address their performance and resilience concerns, and formulate sustainable strategies to resolve persistent challenges
  • Engineering Excellence and Practices:
  • Continuously work on enhancing the reliability, stability, and performance of our digital platforms, being at the forefront of promoting engineering excellence, implementing best practices, and overseeing the integration of fully automated telemetry within modern DevOps frameworks
  • Your work in advancing problem detection and service restoration processes is pivotal
  • Utilizing cutting-edge Site Reliability Engineering methods, coupled with automated alerting and self-healing mechanisms, you are instrumental in improving both cloud-based and on-premises systems, thereby fortifying our digital infrastructure’s robustness and efficiency

Qualifications

What’s needed to succeed (Minimum Qualifications):

  • Bachelor's degree in information technology, Business Administration, Computer Science or relevant field
  • 7+ years of IT and business/industry work experience
  • 5+ years of Site Reliability Engineering experience working with telemetry, observability, self-healing solutions, and platform automation
  • +5 years of experience leading projects and managing people
  • 2 - 3 years of leadership experience in managing cross-functional teams or projects, and influencing senior level management and key stakeholders
  • 2+ years of experience with leading DevOps practices and tools (CI/CD pipelines, Jenkins, GitHub)
  • Recognized expertise in field - in industry and/or within United
  • Proven expertise in leading and influencing technical staff or coordinating work across multiple technology teams
  • Proven experience with monitoring, logging and telemetry tools like Dynatrace, Splunk, Prometheus, AWS Cloudwatch, etc.
  • Proficiency with DevOps practices and tools (CI/CD pipelines, Jenkins, GitHub)
  • Ability to diagnose and troubleshoot issues effectively
  • Strong and effective communication skills and status reporting
  • Experience with AWS networking services like VPC, Route 53, and CloudFront, with understanding of cloud concepts like IaaS, PaaS, and SaaS
  • Experience with distributed storage technologies such as EC2 (Elastic Compute Cloud), S3 (Simple Storage Service), RDS (Relational Database Service), VPC (Virtual Private Cloud), Lambda, and CloudFormation
  • Experience in developing monitoring tools and log analysis tools to manage operations
  • Dynatrace Associate Certification or AWS Certified DevOps Engineer is a plus
  • Must be legally authorized to work in the United States for any employer without sponsorship
  • Successful completion of interview required to meet job qualification
  • Reliable, punctual attendance is an essential function of the position
Seniority level
  • Seniority level
    Mid-Senior level
Employment type
  • Employment type
    Contract
Job function
  • Job function
    Information Technology
  • Industries
    IT Services and IT Consulting, Airlines and Aviation, and Aviation and Aerospace Component Manufacturing

Referrals increase your chances of interviewing at Intelliswift - An LTTS Company by 2x

Get notified about new Reliability Engineering Manager jobs in Chicago, IL.

Wheeling, IL $135,000.00-$150,000.00 1 week ago

Wheeling, IL $130,000.00-$150,000.00 1 day ago

Chicago, IL $202,000.00-$277,000.00 6 days ago

Engineering Technical Manager (Salary Range $127,800-166,200) Job
Engineering Manager - Gear Manufacturing

Oak Brook, IL $100,000.00-$140,000.00 2 weeks ago

Chicago, IL $101,915.00-$119,900.00 6 days ago

Engineering Manager - Onboarding Experience

Chicago, IL $79,000.00-$132,000.00 3 weeks ago

Chicago, IL $135,000.00-$160,000.00 4 months ago

Senior Manager, Mechanical Engineering - Locomotives- EN

Homewood, IL $127,000.00-$167,000.00 1 week ago

Chicago, IL $106,300.00-$146,200.00 5 days ago

Chicago, IL $150,000.00-$200,000.00 3 weeks ago

Chicago, IL $129,800.00-$165,490.00 6 days ago

Engineering Manager, Merchant Experiences (Stripe Dashboard)

Chicago, IL $214,500.00-$321,800.00 3 days ago

Power Utility Distribution Engineering Manager

Chicago, IL $160,000.00-$180,000.00 21 hours ago

Remote Engineering Manager - $170-$190k (Wearable Med Device)

Chicago, IL $170,000.00-$190,000.00 5 days ago

Greater Chicago Area $120,000.00-$160,000.00 3 days ago

Senior Manager, Mechanical Engineering - Locomotives- FR
Manager, Renewable Engineering (Energy Storage)

Chicago, IL $125,000.00-$150,000.00 2 weeks ago

Chicago, IL $143,000.00-$190,000.00 1 day ago

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Senior Manager, Site Reliability Engineering

Precisely

Remote

USD 120,000 - 180,000

Today
Be an early applicant

Senior Manager, Site Reliability Engineering

Centene

Remote

USD 119,000 - 221,000

3 days ago
Be an early applicant

Senior Manager Site Reliability Engineering (Kubernetes)- Remote

Akamai Technologies

Remote

USD 155,000 - 324,000

30+ days ago

Sr Manager, Site Reliability Engineering

United Airlines

Chicago

On-site

USD 137,000 - 187,000

30+ days ago

Senior Site Manager

Michael Page

San Antonio

Remote

USD 160,000 - 195,000

3 days ago
Be an early applicant

Manager, Site Reliability Engineering

Centene

Remote

USD 100,000 - 187,000

3 days ago
Be an early applicant

Site Reliability Engineering Manager

Jobot

Columbus

Remote

USD 165,000 - 190,000

3 days ago
Be an early applicant

Site Reliability Engineering (SRE) Manager, 1LMX MES COE

Lockheed Martin

Fort Worth

Remote

USD 134,000 - 237,000

4 days ago
Be an early applicant

Senior Project Engineer

GE Vernova's Grid Software

Concord

Remote

USD 127,000 - 171,000

4 days ago
Be an early applicant