Telecoms: Senior Manager, Technology Operations & SRE (Reliability)
Telecoms: Senior Manager, Technology Operations & SRE (Reliability)
The Senior Manager, Technology Operations & SRE (Reliability) is a critical leadership role, responsible for ensuring the stability, performance, and resilience of our AWS-hosted open-source telephony platform (SIP, Kamailio, Asterisk, FreeSWITCH, etc.). This role combines hands-on technical expertise in cloud-native environments with strategic people management, serving as a key Incident Commander during critical incidents and driving Site Reliability Engineering (SRE) best practices. You will lead a team of ~20 engineers (SRE and augmented DevOps, including offshore resources), foster cross-functional collaboration, and champion a culture of passion, tenacity, and proactive communication to maintain a highly available platform in a fast-paced, mission-driven organization.
Key Responsibilities:
- Platform Reliability: Ensure the availability, performance, and resilience of our client’s cloud-based telephony platform, leveraging AWS services (EC2, EKS, Route 53, CloudWatch) and monitoring tools (NetScout, OpsGenie, Datadog) to support real-time communication.
- Incident Management: Act as Incident Commander, leading rapid response to outages, call quality issues, or captioning delays, using tools like PagerDuty and AWS CloudWatch to minimize customer impact and provide proactive updates to senior leadership.
- Root Cause Analysis: Conduct thorough RCAs for incidents, implementing corrective actions and refining runbooks to prevent recurrence, with a focus on reducing escalations through effective triage (targeting 80% resolution without escalation).
- Automation & Observability: Develop automation scripts (Python, Bash) and enhance observability with tools like Prometheus, Grafana, and Datadog to monitor WebRTC metrics, captioning accuracy, and infrastructure health, enabling proactive issue detection.
- AWS Expertise: Optimize AWS infrastructure (EC2, EKS, S3, Lambda, Route 53) and Kubernetes clusters for scalability, fault tolerance, and low-latency workloads, mentoring the team to improve platform reliability.
- SRE Best Practices: Drive SRE principles (SLOs, SLIs, error budgets) and SDLC processes, transitioning the team from a disbanded NOC model to a mature SRE framework, focusing on production support and reducing project distractions.
- Team Leadership: Manage and mentor a team of ~20 (SRE and augmented DevOps), fostering a culture of passion, adaptability, and collaboration. Navigate team morale during leadership transitions, winning trust while maintaining objective decision-making.
- Proactive Communication: Provide high-level updates to senior leadership during major platform changes, educating stakeholders on monitoring and outcomes to preempt inquiries and align with organizational goals.
- Cross-Functional Collaboration: Partner with engineering, product, and compliance teams to address reliability gaps, optimize captioning performance, and ensure compliance.
- Continuous Improvement: Stay informed on industry trends (e.g., WebRTC, AI-driven transcription) to enhance telephony architecture and captioning workflows, leveraging analytical skills to interpret platform data.
Qualifications:
Technical Skills
- Experience: 8+ years in Technology Operations, DevOps, or SRE, with strong expertise in AWS cloud-native environments (EC2, EKS, S3, Lambda, CloudWatch, Route 53).
- Observability Tools: Proficiency with NetScout, OpsGenie, Datadog, Prometheus, or Grafana for monitoring infrastructure and application metrics.
- Automation: Strong coding knowledge in Python or Bash for automating workflows and processes.
- DevOps Tools: Experience with Terraform, Jenkins, or GitLab CI for Infrastructure as Code and CI/CD pipelines.
- SRE & SDLC: Deep knowledge of SRE principles (SLOs, SLIs, blameless postmortems) and SDLC processes, with experience building or transitioning teams to SRE models.
- Telephony Knowledge (Preferred): Familiarity with VoIP protocols (SIP, RTP, WebRTC) and open-source telephony software (Asterisk, Kamailio, FreeSWITCH) is a huge plus.
- Networking: Basic understanding of network troubleshooting (e.g., Wireshark) and QoS optimization for low-latency communication.
- Captioning (Optional): Experience with real-time transcription systems (e.g., AWS Transcribe) or caption formats (WebVTT, SRT) is a plus.
Leadership & Soft Skills
- Leadership: Proven ability to lead and mentor diverse technical teams in a remote, high-stakes environment, with experience managing morale during transitions.
- Tenacity & Passion: A proactive, “adapt and overcome” mindset, thriving in a 24/7 support environment with a passion for mission-driven work.
- Communication: Exceptional verbal and written skills for proactive stakeholder updates, cross-functional collaboration, and presenting to non-technical audiences, including compliance teams.
- Problem-Solving: Strong analytical skills to diagnose complex issues under pressure and interpret platform data for decision-making.
- Culture Fit: Ability to align with the fast-paced, collaborative culture.
- Time Management: Adept at prioritizing tasks and managing high-stakes responsibilities in a dynamic setting.
- Confidentiality: Commitment to handling sensitive customer data in compliance with regulations.
- 100% Remote: Work from home, with flexibility to collaborate across US time zones
Seniority level
Seniority level
Mid-Senior level
Employment type
Job function
Job function
Engineering, Information Technology, and ManagementIndustries
Telecommunications
Referrals increase your chances of interviewing at American Workforce Solutions by 2x
Inferred from the description for this job
Medical insurance
Vision insurance
401(k)
Child care support
Get notified about new Technology Operations Manager jobs in United States.
Jamaica, NY $225,000.00-$250,000.00 8 hours ago
Paterson, NJ $225,000.00-$250,000.00 8 hours ago
Orlando, FL $225,000.00-$250,000.00 8 hours ago
Brooklyn, NY $225,000.00-$250,000.00 8 hours ago
Newark, NJ $225,000.00-$250,000.00 8 hours ago
Boston, MA $225,000.00-$250,000.00 21 hours ago
United States $200,000.00-$250,000.00 1 week ago
Arlington, VA $225,000.00-$250,000.00 8 hours ago
Minneapolis–Saint Paul, WI $225,000.00-$250,000.00 8 hours ago
Washington, DC $225,000.00-$250,000.00 8 hours ago
United States $110,000.00-$125,000.00 1 week ago
Atlanta, GA $225,000.00-$250,000.00 8 hours ago
Athens, GA $225,000.00-$250,000.00 8 hours ago
Chicago, IL $225,000.00-$250,000.00 8 hours ago
United States $142,000.00-$202,000.00 1 week ago
Jacksonville, FL $225,000.00-$250,000.00 8 hours ago
Columbus, OH $225,000.00-$250,000.00 8 hours ago
Wilmington, NC $225,000.00-$250,000.00 8 hours ago
Charleston, SC $225,000.00-$250,000.00 8 hours ago
Business Operations Manager, One Medical Operations
Milwaukee, WI $225,000.00-$250,000.00 8 hours ago
Hartford, CT $225,000.00-$250,000.00 8 hours ago
Charlotte, NC $225,000.00-$250,000.00 8 hours ago
Green Bay, WI $225,000.00-$250,000.00 8 hours ago
Baltimore, MD $225,000.00-$250,000.00 8 hours ago
Philadelphia, PA $225,000.00-$250,000.00 8 hours ago
Indianapolis, IN $225,000.00-$250,000.00 8 hours ago
Raleigh, NC $225,000.00-$250,000.00 8 hours ago
Bridgeport, CT $225,000.00-$250,000.00 8 hours ago
Louisville, KY $225,000.00-$250,000.00 8 hours ago
United States $80,000.00-$120,000.00 4 months ago
Grand Rapids, MI $225,000.00-$250,000.00 8 hours ago
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.