
Enable job alerts via email!
Generate a tailored resume in minutes
Land an interview and earn more. Learn more
A leading technology firm in Kuala Lumpur seeks an experienced Senior Site Reliability Engineer to design and maintain large-scale distributed systems. You will optimize system stability and availability while coordinating responses in emergency situations. The ideal candidate has over 6 years of experience in IT, strong software engineering skills, and proficiency in programming languages like Go or Python. We are looking for someone dedicated to continuous learning and technological innovation.
Design, develop, and maintain the stability, availability, and scalability of large-scale distributed systems. Deeply involved in the entire lifecycle of system design and development, bring reliability principles into the architecture to ensure the system has exceptional self-healing capabilities and scalability.
Continuously build intelligent capabilities based on world-class AIOps platform to improve system deployment, monitoring, and operational efficiency. Continuously improving services user experience metrics through operations data science.
In emergency situations, you will be the core decision-maker responsible for quickly minimizing losses, coordinating responses, and conducting rigorous RCA (Root Cause Analysis) afterward to implement systematic preventive measures to avoid similar issues from recurring.
Continuously monitor and manage the usage of infrastructure resources, and optimize infrastructure costs through software architecture improvements.
Write technical documents and reports to share experiences and solutions.
Requirements:
Full-time Bachelor’s degree or above in computer science or related discipline.
A minimum of 6 years of working experience in IT or ICT industry.
Strong background in software engineering, proficient in at least one mainstream programming language (such as Go, Python, Java, etc.), and capable of building complex distributed systems.
Have a deep understanding and practical experience with Linux operating systems, network principles, Databases Principles, and container technologies (Docker/Kubernetes).
Excellent problem analysis and solving skills, with the ability to maintain clear thinking and judgment under production critical and urgent issue scenarios.
Strong sense of responsibility, curiosity, and a passion for continuous learning, dedicated to technological innovation and breakthroughs.
Be careful - Don’t provide your bank or credit card details when applying for jobs. Don't transfer any money or complete suspicious online surveys. If you see something suspicious, report this job ad.