Job Search and Career Advice Platform

Enable job alerts via email!

Senior Site Reliability Engineer (Consumer BG)

Huawei Technologies

Kuala Lumpur

On-site

MYR 100,000 - 130,000

Full time

Yesterday
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading technology firm in Kuala Lumpur seeks an experienced Senior Site Reliability Engineer to design and maintain large-scale distributed systems. You will optimize system stability and availability while coordinating responses in emergency situations. The ideal candidate has over 6 years of experience in IT, strong software engineering skills, and proficiency in programming languages like Go or Python. We are looking for someone dedicated to continuous learning and technological innovation.

Qualifications

  • Minimum of 6 years of working experience in IT or ICT industry.
  • Proficient in at least one mainstream programming language like Go, Python, or Java.
  • Deep understanding and practical experience with Linux, networking, and container technologies.

Responsibilities

  • Design and maintain stability of large-scale distributed systems.
  • Optimize infrastructure costs through architectural improvements.
  • Coordinate responses during emergency situations.

Skills

Software engineering
Linux operating system
Problem analysis
Docker
Kubernetes
Go
Python
Java

Education

Bachelor’s degree in computer science or related discipline
Job description
Senior Site Reliability Engineer (Consumer BG)

Design, develop, and maintain the stability, availability, and scalability of large-scale distributed systems. Deeply involved in the entire lifecycle of system design and development, bring reliability principles into the architecture to ensure the system has exceptional self-healing capabilities and scalability.

Continuously build intelligent capabilities based on world-class AIOps platform to improve system deployment, monitoring, and operational efficiency. Continuously improving services user experience metrics through operations data science.

In emergency situations, you will be the core decision-maker responsible for quickly minimizing losses, coordinating responses, and conducting rigorous RCA (Root Cause Analysis) afterward to implement systematic preventive measures to avoid similar issues from recurring.

Continuously monitor and manage the usage of infrastructure resources, and optimize infrastructure costs through software architecture improvements.

Write technical documents and reports to share experiences and solutions.

Requirements:

Full-time Bachelor’s degree or above in computer science or related discipline.

A minimum of 6 years of working experience in IT or ICT industry.

Strong background in software engineering, proficient in at least one mainstream programming language (such as Go, Python, Java, etc.), and capable of building complex distributed systems.

Have a deep understanding and practical experience with Linux operating systems, network principles, Databases Principles, and container technologies (Docker/Kubernetes).

Excellent problem analysis and solving skills, with the ability to maintain clear thinking and judgment under production critical and urgent issue scenarios.

Strong sense of responsibility, curiosity, and a passion for continuous learning, dedicated to technological innovation and breakthroughs.

Be careful - Don’t provide your bank or credit card details when applying for jobs. Don't transfer any money or complete suspicious online surveys. If you see something suspicious, report this job ad.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.