Enable job alerts via email!

Lead Technical Program Manager, Site Reliability Engineering

Tbwa Chiat/Day Inc

San Francisco (CA)

On-site

USD 120,000 - 180,000

Full time

30+ days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An established industry player is seeking a Senior Technical Program Manager to enhance operational excellence within their Site Reliability Engineering team. This pivotal role involves managing complex projects, optimizing costs, and ensuring the robustness of global cloud infrastructure. The ideal candidate will possess extensive experience in SaaS applications and technical program management, with a strong focus on data-driven decision-making and cross-functional collaboration. Join a forward-thinking company that values diverse perspectives and encourages innovation, making a significant impact on the digital experience landscape.

Qualifications

  • 8+ years of experience in Site Reliability Engineering with a focus on SaaS applications.
  • Proven experience in technical program management within cloud infrastructure.

Responsibilities

  • Lead and manage complex SRE projects aligning with organizational goals.
  • Oversee best practices for incident management and capacity planning.

Skills

Site Reliability Engineering
Program Management
Analytical Skills
Problem-Solving
Communication

Education

Bachelor's degree in Computer Science
Master's degree in Engineering

Tools

AWS
Infrastructure as Code
Automation Tools

Job description

Senior Technical Program Manager, Site Reliability
Who We Are

Cisco ThousandEyes is a Digital Experience Assurance platform that empowers organizations to deliver flawless digital experiences across every network – even the ones they don’t own. Powered by AI and an unmatched set of cloud, internet and enterprise network telemetry data, ThousandEyes enables IT teams to proactively detect, diagnose, and remediate issues – before they impact end-user experiences.

ThousandEyes is deeply integrated across the entire Cisco technology portfolio and beyond, helping customers deploy at scale while also delivering AI-powered assurance insights within Cisco’s leading Networking, Security, Collaboration, and Observability portfolios.

Role Description

We are seeking an experienced Senior SRE Technical Program Manager to lead and drive operational excellence within our SRE team. This role is pivotal in ensuring the robustness, scalability, and efficiency of our global cloud infrastructure. The ideal candidate will have a proven track record of managing SaaS applications, optimizing costs, improving operational processes, and coordinating large-scale infrastructure changes. They will have a background in program management, and will have experience partnering with SRE teams and experience with cloud environments for commercial and government customers.

What you'll do
  1. Strategic Program Management:
    • Lead and manage complex SRE projects and programs that align with organizational goals and priorities.
    • Develop and execute strategies for operational efficiency, reliability, and scalability of our cloud infrastructure.
  2. Cost Optimization:
    • Analyze and optimize infrastructure costs, aiming for a reduction in COGS while maintaining or improving service quality.
    • Collaborate with finance and engineering teams to develop cost management dashboards and reporting tools.
  3. Operational Excellence:
    • Oversee the development and implementation of best practices for incident management, change management, and capacity planning.
    • Drive initiatives to improve system uptime, performance, and reliability, ensuring adherence to SLAs.
  4. Visibility and Reporting:
    • Enhance operational visibility through the development of comprehensive monitoring and alerting systems.
    • Lead operational reviews and post-mortems, ensuring actionable insights and continuous improvement.
  5. Cross-Functional Collaboration:
    • Work closely with software engineering teams to ensure infrastructure changes and dependencies are effectively communicated and executed.
    • Facilitate cross-team coordination to support large-scale infrastructure projects and initiatives.
  6. Technical Leadership:
    • Mentor and guide SRE team members, fostering a culture of technical excellence and innovation.
    • Stay abreast of the latest industry trends and technologies to drive continuous improvement.
Requirements
  • Bachelor's or Master's degree in Computer Science, Engineering, or a related field.
  • 8+ years of experience in Site Reliability Engineering, with a focus on managing SaaS applications.
  • Proven experience in technical program management, preferably within a cloud infrastructure context.
  • Demonstrated ability in managing global cloud infrastructure with significant monthly COGS.
  • Strong analytical and problem-solving skills, with a focus on data-driven decision-making.
  • Excellent communication and collaboration skills, with the ability to influence and drive change across teams.
  • Proficiency in cloud platforms (especially AWS) and related technologies.
  • Experience with infrastructure as code, automation, and modern DevOps practices.
  • Experience with migrating and maintaining both commercial and federal environments.
Role Success Metrics:
  • Achieving a targeted reduction in COGS while maintaining service levels.
  • Improvement in system uptime and reliability metrics.
  • Successful execution of infrastructure projects within scope, on time, and within budget.
  • Enhanced operational visibility and reporting capabilities.

Cisco values the perspectives and skills that emerge from employees with diverse backgrounds. That's why Cisco is expanding the boundaries of discovering top talent by not only focusing on candidates with educational degrees and experience but also placing more emphasis on unlocking potential. We believe that everyone has something to offer and that diverse teams are better equipped to solve problems, innovate, and create a positive impact.

We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification. Research shows that people from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy. We urge you not to prematurely exclude yourself and to apply if you're interested in this work.

Cisco is an Affirmative Action and Equal Opportunity Employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, gender, sexual orientation, national origin, genetic information, age, disability, veteran status, or any other legally protected basis.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Site Reliability Engineer (SRE)

Air Apps

San Francisco

Remote

USD 90,000 - 150,000

Today
Be an early applicant

Lead, Site Reliability Engineering, Infrastructure Security

MongoDB

San Francisco

Remote

USD 120,000 - 180,000

2 days ago
Be an early applicant

Exception - Engineering & IT

ICONMA

Newark

Remote

USD 120,000 - 160,000

9 days ago

Program Manager & Lead Verifier, Greenhouse Gas

-

Coffeyville

Remote

USD 100,000 - 135,000

5 days ago
Be an early applicant

Site Reliability Engineer

Iceberg

Remote

USD 175,000 - 200,000

Today
Be an early applicant

Y-Kids Program Leader

YMCA of San Francisco

San Francisco

On-site

USD 150,000 - 200,000

Yesterday
Be an early applicant

Network and Hybrid DevOps Engineer

SixMap, Inc.

Baltimore

Remote

USD 90,000 - 160,000

Today
Be an early applicant

Principal SRE (Site Reliability Engineer) - Remote

SailPoint Technologies Holdings, Inc.

Remote

USD 176,000 - 252,000

Today
Be an early applicant

Lead Site Reliability Engineer - Cloud Platforms

Jobot

Minneapolis

Remote

USD 160,000 - 200,000

2 days ago
Be an early applicant