Job Search and Career Advice Platform

Enable job alerts via email!

Site Reliability Engineer

nedbank

Johannesburg

On-site

ZAR 600 000 - 900 000

Full time

Today
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading financial services provider in Johannesburg seeks an experienced IT professional specializing in Site Reliability Engineering. The role involves ensuring the reliability and efficiency of technology solutions while mentoring junior engineers. Candidates should have a strong IT background with a minimum of 8 years' experience in relevant technologies. This position offers the opportunity to influence operational practices across the organization in a dynamic team environment.

Qualifications

  • Minimum of 8 years in IT with 5 years in relevant domains.
  • Preferred certifications in AWS/Azure, ITIL, or DevOps.
  • Proficient in technical communication.

Responsibilities

  • Collaborate with stakeholders to enhance reliability services.
  • Engage with teams to adopt SRE practices.
  • Implement automated solutions for high availability.

Skills

Microservices and containerization (K8s or Docker)
Troubleshooting and root cause analysis
Site Reliability Engineering Best practices
DevOps framework
Infrastructure and application monitoring
Incident management and post incident analysis
Tech Savvy
Decision Making
Building Networks
Communication
Trouble shooter

Education

Matric / Grade 12 / National Senior Certificate
Advanced Diplomas/National 1st Degrees
B-Tech Computer systems, BSc - Info Sys/Computer System
Job description

REQ 138974 - Keabetswe Modise

Closing Date: 05 December 2025

Job Family

Information Technology

Application Development

Manage Self: Professional

Job Purpose

To serve as an IT professional specialising in Site Reliability Engineering (SRE) at Nedbank, contributing to the strategic capability of the organisation as part of a dynamic team. The role is focused on advancing SRE discipline and working with other domains to influence the adoption. It is a strategic, consultancy-based role that involves enabling and contributing to solutions aligned with the principles of reliability, availability, and resilience, while also promoting frequent and efficient delivery from development teams.

Job Responsibilities
  • Collaborating with stakeholders, engineers, and operational SMEs to ensure all relevant parties are up to date with what is top of mind within the reliability service offerings
  • Evolve production services based on customer needs and technology to ensure we remain competitive in the financial services industry/market.
  • Influence squads during service or platform design to prevent system failures and improve performance.
  • Engage with leadership and teams to adopt SRE practices with a core focus to contribute towards incident management and advocate for blameless postmortems.
  • Engage and influence all teams involved in the software development life cycle with regards to observability, high availability utilising new or existing technology and improve disaster recovery plans.
  • Implement automated-based solutions to achieve high availability, efficiency, reduce cost and performance to systems.
  • Coach teams on best practices within the organisation via internal forums to position SRE fundamental knowledge and promote enterprise-wide knowledge sharing
  • Assist with creating and maintaining system health and performance metrics reflecting real-time data, enabling proactive resolution, and faster troubleshooting.
  • Collaborate and partner with DevOps engineer/coach to ensure efficient continuous integration/continuous deployment pipelines and resolve any failures or improve the flow.
  • Take charge of technical leadership, engage with teams to identify best solutions, and mentor Junior Site Reliability Engineers to resolve technical challenges.
  • Assist in defining and implementing metrics such as SLI's and SLO's to gain insight of user experience and performance of application.
  • Define and deliver technical standards in partnership with all disciplines of software engineering for adoption of site reliability engineering.
  • Participate and closely work with relevant COE's to improve release of new features to facilitate time to market.
  • Build and maintain strategic relationships with the business units and vendors to be in sync on current ways of work and business decisions that are being embraced.
  • Conduct maturity assessments within teams to measure SRE level of adoption and use results to outline a plan to assist teams how to get to the next level of maturity.
  • Utilise application monitoring tools to generate report for informed decision making and driving visibility of Site Reliability Engineering.
  • Adhere and comply with Nedbank group information management, data integrity and security policies and best practices to protect client data.
  • Manage concurrent objectives, projects, groups, activities and time allocation based on prioritisation for effective delivery.
  • Stay abreast of the most recent industry trends and practices and implement learnings back into the business to ensure alignment across industry.
  • Responsible for the success of the team and projects by taking ownership of issues and ensuring their resolution.
  • Articulate technical concepts to diverse audiences through proficient written and verbal communication to ease the understanding of the SRE discipline.
  • Contribute to the successful implementation of the business strategy in an innovative high passed environment.
Essential Qualifications - NQF Level
  • Matric / Grade 12 / National Senior Certificate
  • Advanced Diplomas/National 1st Degrees
Preferred Qualification
  • B-Tech Computer systems, BSc - Info Sys/Computer System or Related qualification
Preferred Certifications
  • Associate or professional (Amazon Web Services/Azure Solutions), ITIL, DevOps
Minimum Experience Level
  • Min 8 years IT Experience with 5 years in relevant technologies or domains
Business Drivers
  • Technical Expert
  • Analyst
  • Consultant
Technical / Professional Knowledge
  • Microservices and containerization (K8s or Docker)
  • Troubleshooting and root cause analysis
  • Site Reliability Engineering Best practices
  • DevOps framework
  • Infrastructure and application monitoring
  • Incident management and post incident analysis
  • Tech Savvy
  • Decision Making
  • Building Networks
  • Communication
  • Trouble shooter

Please contact the Nedbank Recruiting Team at +27 860 555 566

Nedbank Ltd Reg No 1951/000009/06.
Authorised financial services and registered credit provider (NCRCP16).

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.