Enable job alerts via email!

Manager Enterprise Monitoring And Observability

PartnerUp (Pty) Ltd

Johannesburg

On-site

ZAR 600 000 - 1 000 000

Full time

30+ days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An established industry player is seeking a Manager of Enterprise Monitoring and Observability to lead a dynamic team within their Bank Command Center. This pivotal role involves establishing proactive monitoring solutions and ensuring the resilience of business-critical systems. You will engage with various stakeholders, manage team performance, and drive automation initiatives to enhance operational efficiency. If you have a strong background in technology management and a passion for continuous improvement, this opportunity offers a chance to make a significant impact in a collaborative environment focused on innovation and excellence.

Qualifications

5+ years in technology with 3+ years in management preferred.
Intermediate knowledge of ITSM, DevOps, and cloud technologies required.
Experience with observability and monitoring tools essential.

Responsibilities

Lead the Monitoring and Observability team focused on proactive solutions.
Develop enterprise logging and metrics for critical systems.
Collaborate with stakeholders to define SLAs and thresholds.

Skills

Leadership

Team Management

Automation

Incident Management

Problem Management

Communication Skills

Cloud Technologies

Service-Oriented Architecture

Negotiation Skills

Performance Reporting

Education

Bachelor's degree in technology

Associate degree in technology

Tools

Grafana Enterprise

Dynatrace

Datadog

NewRelic

Opsgenie

AWS CloudWatch

Kubernetes

Terraform

OpenTelemetry

GitHub

As a Manager of Enterprise Monitoring and Observability within the Bank Command Center, you will be responsible for leading the Monitoring and Observability practice and capability across the Bank enterprise within Systems Operations and Management. The role will establish monitoring and observability, proactive solutions, alerting, automation, and site reliability for business-critical systems and platforms. You will be responsible for managing and developing team members and project resources for delivery, onboarding, and continuous improvement for operational plans to meet the business objectives of the Bank.

You will engage with the following stakeholders :

Executive Management Internal and External
Command Center Management
Incident and Problem Management
Change Management
Product Management and respective heads
Technical Heads, Technical resources
All clients and key supporting vendors
Relevant regulatory bodies (SARB, PASA)

Your key responsibilities include :

Manage, lead, and set priorities for the Monitoring and Observability team specifically focused on monitoring and observability, proactive solutions, alerting, automation, and site reliability / resilience.
Coach, train and develop direct reports (includes appraising job performance and conducting performance reviews)
Lead a team to develop enterprise logging, metrics, and traces for business-critical systems as well as dashboards (visibility) for different levels of support.
Work with infrastructure, product, and support teams to define tools and strategy to ensure full observability, alerting, and proactive monitoring of business-critical systems.
Integrate full observability and proactive monitoring practice within Systems Operations and Management to ensure tracking and timely communication of events, outages, and issues.
Collaborate with Business and IT stakeholders to define thresholds, SLAs, and runbooks and help proactively identify issues and drive down reoccurring incidents.
Lead oversight of third-party vendors’ work to ensure vendors fulfil contractual commitments and statements of work (SOW)
Assist with monitoring events (e.g., warnings and exceptions) and identify routine activities and resolutions that can be automated to improve system and process efficiencies for the Command Centre.
Serve as a subject matter expert and maintain knowledge of current industry trends and developing or related technologies.
Ensure all activities are in compliance with rules, regulations, policies, procedures, and service resilience.
Serve as a Service Experience Owner for Monitoring and Observability platforms.
Serve as Project Leader to Plan, Organize, Lead and Control projects and initiatives for Enterprise Monitoring and Observability
Own Product lifecycle management, renewal, support contracts, vendor negotiations and product strategy
Own the Process management and Documentation for all aspects of the Service, ie. Enterprise Monitoring and Observability
Drive and own the modernization of reporting and digital customer experience channels to continuously improve the customer satisfaction index.
Develop and adopt into the organization the strategic roadmap for Enterprise Monitoring and Observability with Senior Stakeholders
Own the transition and transformation of the service to all business departments.
Own for transformation initiative focusing on Machine Learning and Artificial Intelligence.
Responsible for the training and mentoring of Command Center and Operations Teams on existing and new developments within the Service Scope.

QUALIFICATIONS / KNOWLEDGE

Bachelor’s degree or associate degree required; field of study in technology required
Minimum five years’ experience in technology or related fields required.
Minimum three years’ experience managing people preferred.
Minimum Site Reliability Foundation preferred.
Intermediate knowledge of ITSM / ITIL / ITOM, Devops, DevSecOps, Automation and Reporting
Intermediate knowledge of Observability and Monitoring Tools Grafana Enterprise, Dynatrace, Datadog, NewRelic, Opsgenie, or similar and AWS CloudWatch, Network and Server Monitoring Tools,
Intermediate knowledge of AWS Cloud Technologies, Azure Cloud and Microsoft365
Working knowledge of service-oriented architecture (SOA), microservices, and / or API network design paradigm
Working knowledge of network protocols / technology, databases, and application servers and their roles in service delivery
Experience using cloud native technologies (Kubernetes, Terraform, OpenTelemetry, eBPF, GitHub) in a production environment.

EXPERIENCE

5 to 8 years of experience with development teams and systems owners.
Required Experience with enterprise environments and critical-mission platforms both on premises and cloud.
Experience with financial services hosting providers or payment services providers.
Management of technical teams
Performance reporting and Intelligent trend analysis
Skilled in negotiating with internal and external stakeholders or business partners.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.