Enable job alerts via email!

Site Reliability Engineer

Level-Up

Johannesburg

On-site

ZAR 600,000 - 900,000

Full time

30+ days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading company in Johannesburg seeks a skilled Site Reliability Engineer (SRE) to enhance infrastructure reliability and performance. The role involves automating processes using Ansible, managing cloud resources, and collaborating with development teams to optimize systems. Ideal candidates have a strong IT background, relevant certifications, and extensive experience in a fast-paced environment.

Qualifications

  • Minimum 8 years in Enterprise IT, with 3+ years in DevOps or SRE.
  • Relevant certifications like LPIC or Azure Administrator Associate are a plus.

Responsibilities

  • Automate and maintain IT infrastructure using Ansible.
  • Manage Windows and Linux servers, ensuring security and performance.
  • Collaborate with DevOps for Azure cloud management.

Skills

Ansible
Linux Administration
Cloud Management
Scripting
Problem Solving

Education

Bachelors degree in Computer Science or related field

Tools

Azure
VMware
Docker

Job description

We are looking for a skilled Site Reliability Engineer (SRE) with expertise in Ansible and Linux to join our dynamic team. The successful candidate will play a critical role in maintaining the reliability, scalability, and performance of our infrastructure, driving automation, and collaborating with development teams to optimize system efficiency.

Key Responsibilities

  1. Infrastructure Automation
    • Automate and maintain IT infrastructure using Ansible to streamline operations.
  2. System Administration (Linux and Windows)
    • Manage virtual and physical Windows and Linux servers.
    • Automate server patching and updates to ensure systems remain current.
    • Implement automated security measures for all servers.
    • Monitor server performance and health.
    • Maintain comprehensive system documentation, including configuration and troubleshooting guides.
    • Conduct troubleshooting and root cause analysis as needed.
    • Ensure robust backup, disaster recovery, and business continuity plans are in place and followed.
  3. Azure Cloud Management
    • Collaborate with DevOps to deploy, configure, and manage Azure virtual machines and resources.
    • Monitor cloud services for availability, performance, and security.
    • Work with the networking team to implement, monitor, and secure cloud networking infrastructure.
    • Ensure backup, disaster recovery, and business continuity plans are maintained for cloud systems.
  4. System Monitoring and Optimization
    • Deploy and maintain monitoring tools for proactive system oversight and alerting.
    • Analyze performance data to identify and resolve bottlenecks.
    • Conduct capacity planning to support scalability and meet business needs.
    • Partner with development teams to enhance application performance on infrastructure.
  5. Documentation and Collaboration
    • Create and update technical documentation, including system configurations and procedures.
    • Work with cross-functional teams to provide technical support and solutions.
    • Participate in on-call rotations and respond promptly to system emergencies.
    • Stay informed on industry trends, emerging technologies, and best practices in system administration, cloud computing, and virtualization.

Qualifications

  • Bachelors degree in Computer Science, Information Technology, or a related field (or equivalent experience).
  • Relevant certifications (e.g., Linux Professional Institute (LPIC), Microsoft Certified: Azure Administrator Associate) are a plus.

Experience & Technical Skills

  • Minimum of 8 years in an Enterprise IT environment, with at least 3 years in a DevOps or SRE role.
  • Strong expertise in Ansible for automation and configuration management.
  • Proficient in Linux system administration (installation, configuration, troubleshooting).
  • Hands-on experience with hypervisor technologies (e.g., VMware, Hyper-V, Proxmox).
  • Knowledge of containerization technologies (e.g., Docker, Kubernetes).
  • Experience managing Azure cloud services, including VMs, storage, networking, and security.
  • Proficiency in scripting languages (e.g., Bash, PowerShell, Python) for automation.

Skills & Competencies

  • Excellent problem-solving skills and ability to work independently or in a high-performance team.
  • Strong sense of ownership over tasks, projects, and issues.
  • Effective communication and interpersonal skills to collaborate with stakeholders at all levels.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.