Enable job alerts via email!

Manager Enterprise Monitoring And Observability

PartnerUp (Pty) Ltd

Johannesburg

On-site

ZAR 600 000 - 1 000 000

Full time

30+ days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An established industry player is seeking a Manager of Enterprise Monitoring and Observability to lead a dynamic team within their Bank Command Center. This pivotal role involves establishing proactive monitoring solutions and ensuring the resilience of business-critical systems. You will engage with various stakeholders, manage team performance, and drive automation initiatives to enhance operational efficiency. If you have a strong background in technology management and a passion for continuous improvement, this opportunity offers a chance to make a significant impact in a collaborative environment focused on innovation and excellence.

Qualifications

  • 5+ years in technology with 3+ years in management preferred.
  • Intermediate knowledge of ITSM, DevOps, and cloud technologies required.
  • Experience with observability and monitoring tools essential.

Responsibilities

  • Lead the Monitoring and Observability team focused on proactive solutions.
  • Develop enterprise logging and metrics for critical systems.
  • Collaborate with stakeholders to define SLAs and thresholds.

Skills

Leadership
Team Management
Automation
Incident Management
Problem Management
Communication Skills
Cloud Technologies
Service-Oriented Architecture
Negotiation Skills
Performance Reporting

Education

Bachelor's degree in technology
Associate degree in technology

Tools

Grafana Enterprise
Dynatrace
Datadog
NewRelic
Opsgenie
AWS CloudWatch
Kubernetes
Terraform
OpenTelemetry
GitHub

Job description

As a Manager of Enterprise Monitoring and Observability within the Bank Command Center, you will be responsible for leading the Monitoring and Observability practice and capability across the Bank enterprise within Systems Operations and Management. The role will establish monitoring and observability, proactive solutions, alerting, automation, and site reliability for business-critical systems and platforms. You will be responsible for managing and developing team members and project resources for delivery, onboarding, and continuous improvement for operational plans to meet the business objectives of the Bank.

You will engage with the following stakeholders :

  • Executive Management Internal and External
  • Command Center Management
  • Incident and Problem Management
  • Change Management
  • Product Management and respective heads
  • Technical Heads, Technical resources
  • All clients and key supporting vendors
  • Relevant regulatory bodies (SARB, PASA)

Your key responsibilities include :

  • Manage, lead, and set priorities for the Monitoring and Observability team specifically focused on monitoring and observability, proactive solutions, alerting, automation, and site reliability / resilience.
  • Coach, train and develop direct reports (includes appraising job performance and conducting performance reviews)
  • Lead a team to develop enterprise logging, metrics, and traces for business-critical systems as well as dashboards (visibility) for different levels of support.
  • Work with infrastructure, product, and support teams to define tools and strategy to ensure full observability, alerting, and proactive monitoring of business-critical systems.
  • Integrate full observability and proactive monitoring practice within Systems Operations and Management to ensure tracking and timely communication of events, outages, and issues.
  • Collaborate with Business and IT stakeholders to define thresholds, SLAs, and runbooks and help proactively identify issues and drive down reoccurring incidents.
  • Lead oversight of third-party vendors’ work to ensure vendors fulfil contractual commitments and statements of work (SOW)
  • Assist with monitoring events (e.g., warnings and exceptions) and identify routine activities and resolutions that can be automated to improve system and process efficiencies for the Command Centre.
  • Serve as a subject matter expert and maintain knowledge of current industry trends and developing or related technologies.
  • Ensure all activities are in compliance with rules, regulations, policies, procedures, and service resilience.
  • Serve as a Service Experience Owner for Monitoring and Observability platforms.
  • Serve as Project Leader to Plan, Organize, Lead and Control projects and initiatives for Enterprise Monitoring and Observability
  • Own Product lifecycle management, renewal, support contracts, vendor negotiations and product strategy
  • Own the Process management and Documentation for all aspects of the Service, ie. Enterprise Monitoring and Observability
  • Drive and own the modernization of reporting and digital customer experience channels to continuously improve the customer satisfaction index.
  • Develop and adopt into the organization the strategic roadmap for Enterprise Monitoring and Observability with Senior Stakeholders
  • Own the transition and transformation of the service to all business departments.
  • Own for transformation initiative focusing on Machine Learning and Artificial Intelligence.
  • Responsible for the training and mentoring of Command Center and Operations Teams on existing and new developments within the Service Scope.

QUALIFICATIONS / KNOWLEDGE

  • Bachelor’s degree or associate degree required; field of study in technology required
  • Minimum five years’ experience in technology or related fields required.
  • Minimum three years’ experience managing people preferred.
  • Minimum Site Reliability Foundation preferred.
  • Intermediate knowledge of ITSM / ITIL / ITOM, Devops, DevSecOps, Automation and Reporting
  • Intermediate knowledge of Observability and Monitoring Tools Grafana Enterprise, Dynatrace, Datadog, NewRelic, Opsgenie, or similar and AWS CloudWatch, Network and Server Monitoring Tools,
  • Intermediate knowledge of AWS Cloud Technologies, Azure Cloud and Microsoft365
  • Working knowledge of service-oriented architecture (SOA), microservices, and / or API network design paradigm
  • Working knowledge of network protocols / technology, databases, and application servers and their roles in service delivery
  • Experience using cloud native technologies (Kubernetes, Terraform, OpenTelemetry, eBPF, GitHub) in a production environment.

EXPERIENCE

  • 5 to 8 years of experience with development teams and systems owners.
  • Required Experience with enterprise environments and critical-mission platforms both on premises and cloud.
  • Experience with financial services hosting providers or payment services providers.
  • Management of technical teams
  • Performance reporting and Intelligent trend analysis
  • Skilled in negotiating with internal and external stakeholders or business partners.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.