Enable job alerts via email!

SRE Manager

Gravity IT Resources

Charlotte (NC)

On-site

USD 120,000 - 150,000

Full time

Today
Be an early applicant

Job summary

A technology solutions company in Charlotte, NC is seeking an experienced SRE Manager to lead the expansion of SRE practices. This role focuses on enhancing operational reliability, driving automation initiatives, and managing a global SRE team. The ideal candidate has strong expertise in Azure technologies, experience with automation tools like Terraform, and excellent communication skills. A commitment to fostering collaboration between teams is essential.

Qualifications

  • Proven experience in building and leading Operational and Engineering teams.
  • Experience in overseeing incident response processes and conducting post-incident analysis.
  • Strong expertise in Azure technologies and understanding of Agile and ITIL frameworks.

Responsibilities

  • Lead the expansion of SRE practices and evaluate operational workflows.
  • Define and implement an automation framework and review SLIs, SLOs, and SLAs.
  • Upskill team members and foster a high-performing team culture.

Skills

Building and leading operational teams
Collaboration between SRE and app development
Monitoring & Observability tools
Automation initiatives using Terraform
Strong scripting in Python or Powershell
Excellent communication skills

Tools

Terraform
ServiceNow
Azure DevOps
Logic Monitor
Prometheus
Grafana
Splunk
Container orchestration
Job description
To Apply for this Job Click Here

Job Title: SRE Manager
Location: Charlotte, NC (onsite)
Key Responsibilities:

  • Lead the expansion of SRE practices from a small and high performing team to a larger global function incorporating on-premise infrastructure technologies.
  • Evaluate current operational workflows and RACIs, identify toil and complete assessment of skills across the global team.
  • Execute a comprehensive roadmap to transition reactive operational day to day activities into proactive, SRE-aligned processes with a focus on reliability, automation, observability, and incident management.
  • Upskill team members through tailored training programs on SRE principles, cloud operations and automation tools.
  • Collaborate with architects, platform engineering, ServiceNow developers and application teams to define and implement an observability framework in order to enhance proactive incident detection and reduce MTTR.
  • Define and implement an automation framework to ensure sustainable, responsible, and effective use of automation to reduce toil and risk.
  • Define and regularly review SLIs, SLOs, SLAs, error budgets, and incident response processes.
  • Oversee recruitment, orientation, and professional development of the global SRE team.
  • Foster a high-performing team culture.
  • Build strong relationships with internal and external stakeholders.
  • Prepare and present reports on operational performance.
  • Oversee incident response and post-incident analysis processes and drive a culture of blameless post-mortems across multiple teams.

Key Requirements:

  • Proven experience in building and leading Operational and Engineering teams.
  • Adept at fostering collaboration between SRE and application development teams to drive operational excellence, reduce downtime, and help application teams accelerate delivery cycles.
  • Have defined and monitored SRE principles including SLIs, SLOs, SLAs, error budgets, and incident response strategies.
  • Has overseen incident response processes, skilled in post-incident analysis and conducting blameless post-mortems with multiple teams, driving proactive measures to prevent future incidents.
  • Experience of spearheading automation initiatives using Terraform, and significantly reducing infrastructure provisioning time.
  • Experience of Monitoring & Observability tools such as Logic Monitor, Azure Monitor, Prometheus, Grafana, Dynatrace and Splunk.
  • Experience with ServiceNow and Azure DevOps and solid understanding of Agile, ITIL and ITSM frameworks.
  • Strong expertise in Azure technologies. Experience with other CSPs highly beneficial.
  • Proficiency in IaC tools including Terraform.
  • Experience with Sharepoint administration highly beneficial.
  • Experience with container orchestration.
  • Strong scripting or programming skills (e.g., Python, Powershell).
  • Excellent communication skills.
  • Experience in managing other managers highly beneficial.
To Apply for this Job Click Here

Equal Employment Opportunity Statement
Gravity IT Resources is an Equal Opportunity Employer. We are committed to creating an inclusive environment for all employees and applicants. We do not discriminate on the basis of race, color, religion, sex (including pregnancy, sexual orientation, or gender identity), national origin, age, disability, genetic information, veteran status, or any other legally protected characteristic. All employment decisions are based on qualifications, merit, and business needs.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.