Enable job alerts via email!
Boost your interview chances
Create a job specific, tailored resume for higher success rate.
A leading company in the IT industry is seeking a Site Reliability Engineer to enhance their operations in Johannesburg. The role involves automating IT infrastructure, managing cloud services, and ensuring system performance and security. Ideal candidates will have extensive experience in IT environments, strong skills in Ansible, and proficiency in Azure cloud management.
Our client in the IT industry is looking for a Site Reliability Engineer to join their team in Gauteng, Johannesburg North.QualificationsBachelors degree in Computer Science, Information Technology, or a related field (or equivalent experience).Relevant certifications (e.g., Linux Professional Institute (LPIC), Microsoft Certified : Azure Administrator Associate) are a plus.Experience & Technical SkillsMinimum of 8 years in an Enterprise IT environment, with at least 3 years in a DevOps or SRE role.Strong expertise in Ansible for automation and configuration management.Proficient in Linux system administration (installation, configuration, troubleshooting).Hands-on experience with hypervisor technologies (e.g., VMware, Hyper-V, Proxmox).Knowledge of containerization technologies (e.g., Docker, Kubernetes).Experience managing Azure cloud services, including VMs, storage, networking, and security.Proficiency in scripting languages (e.g., Bash, PowerShell, Python) for automation.Key ResponsibilitiesInfrastructure AutomationAutomate and maintain IT infrastructure using Ansible to streamline operations.System Administration (Linux and Windows)Manage virtual and physical Windows and Linux servers.Automate server patching and updates to ensure systems remain current.Implement automated security measures for all servers.Monitor server performance and health.Maintain comprehensive system documentation, including configuration and troubleshooting guides.Conduct troubleshooting and root cause analysis as needed.Ensure robust backup, disaster recovery, and business continuity plans are in place and followed.Azure Cloud ManagementCollaborate with DevOps to deploy, configure, and manage Azure virtual machines and resources.Monitor cloud services for availability, performance, and security.Work with the networking team to implement, monitor, and secure cloud networking infrastructure.Ensure backup, disaster recovery, and business continuity plans are maintained for cloud systems.System Monitoring and OptimizationDeploy and maintain monitoring tools for proactive system oversight and alerting.Analyze performance data to identify and resolve bottlenecks.Conduct capacity planning to support scalability and meet business needs.Partner with development teams to enhance application performance on infrastructure.Documentation and CollaborationCreate and update technical documentation, including system configurations and procedures.Work with cross-functional teams to provide technical support and solutions.Participate in on-call rotations and respond promptly to system emergencies.Stay informed on industry trends, emerging technologies, and best practices in system administration, cloud computing, and virtualization
Reliability Engineer • Johannesburg, Gauteng