
Enable job alerts via email!
A leading astronomy organization in South Africa seeks a Senior Computer Systems Engineer to lead the compute and storage systems team for SKA-Mid. This role focuses on managing scalable operations and mentoring team members while driving innovation and ensuring high-performance infrastructure. Candidates should have a strong technical background in system design and operations management, with experience in distributed systems.
The SKA Observatory (SKAO) is a next-generation radio astronomy facility that will revolutionise our understanding of the Universe and the laws of fundamental physics.
Enabled by cutting‑edge technology, it promises to have a major impact on society, in science and beyond. In South Africa, the SKAO is collaborating with SARAO to operate and support the construction of the mid‑frequency telescope (SKA‑Mid) in the remote Karoo region. The SKA‑Mid Senior Computer Systems Engineer will lead the compute and storage systems team for SKA‑Mid and will report to the SKA‑Mid Site Reliability Engineering (SRE) Manager within SKA‑Mid Computing & Software, providing hands‑on technical leadership in the design, implementation, and long‑term operation and maintenance of secure, reliable, and high‑performance computer systems infrastructure for the Telescopes hosted by SARAO. While contributing to computing systems enablement, this role also focuses on shaping operational practices, supporting local delivery partnerships, and helping build the team that will manage computing systems operations as the telescopes transition from construction to steady‑state operations. This role involves guiding infrastructure development, mentoring team members, and ensuring systems align with SRE principles.
Responsibilities include deploying and optimising systems, managing faults, contributing to long‑term infrastructure planning, and ensuring scalable, maintainable operations. The position plays a key role in cross‑team collaboration, driving innovation while supporting sustainable and resilient computing environments. Key responsibilities: Contribute to the global design and implementation of scalable and fault‑tolerant infrastructure systems that support engineering and operational needs. Contribute to the deployment, configuration, and maintenance of distributed storage and database systems. Analyse system failures, performance issues, and misconfigurations across hardware, software, and network layers. Lead and mentor the computer systems engineers and contribute to strategic technical planning.