
Enable job alerts via email!
Generate a tailored resume in minutes
Land an interview and earn more. Learn more
A leading technology firm in Kuala Lumpur is seeking a Cloud Operations Engineer to manage high-availability production environments and lead SRE practices. The ideal candidate will have over 5 years of experience in cloud operations, expertise in GCP/AWS, and strong problem-solving skills. Along with competitive compensation and performance-based bonuses, the role offers opportunities for career development and collaboration with international teams. Proficiency in both Chinese and English is required.
Manage and optimize high-availability production environments, including servers, cloud platforms (GCP / AWS), operating systems (Linux / Windows), and middleware to ensure global system stability.
Lead system health checks, capacity planning, performance tuning, complex incident troubleshooting, and version upgrades to continuously improve reliability and performance.
Design and enhance multi-cluster and multi-cloud operations strategies (SRE practices), covering observability, automated monitoring, incident detection, and self-healing capabilities.
Develop and optimize monitoring frameworks, alerting systems, and emergency response mechanisms; participate in 24/7 on-call rotation for high-availability support.
Oversee data backup, disaster recovery planning, log auditing, and cloud security enforcement to maintain a strong information security posture.
Drive automation for container and cloud resource management; develop internal tools (scripts/platforms) to reduce manual work and improve delivery efficiency.
Conduct post-incident reviews, summarize learnings, and drive standardization and continuous improvement across operational processes.
Bachelor’s degree (or above) in Computer Science, Information Security, or related fields, with 5+ years of experience in Cloud Operations/SRE.
Strong expertise in GCP / AWS multi-cloud architecture, including deployment, migration, scaling, and day-to-day operations.
Strong understanding of SRE methodologies, CI/CD pipelines, Infrastructure as Code (Terraform / Ansible), and DevOps/DevSecOps best practices.
Solid knowledge of networking fundamentals (TCP/IP, load balancing, routing, network security), capable of handling complex network troubleshooting.
Excellent cross-functional communication skills, with proven ability to collaborate with Engineering, Security, and Product teams.
Able to participate in 24/7 on-call rotation, with strong problem-solving skills under pressure.
Proficient in both Chinese and English to support regional collaboration.
Relevant certifications such as CKA / CKS / CNCF are an added advantage.
Global Exposure - Collaborate with international teams and lead large-scale infrastructure projects across multiple regions, gaining global technical experience and perspective.
Career Development - Expand your expertise and leadership capabilities in a fast-paced, innovation-driven environment with structured growth opportunities.
Attractive Compensation - Enjoy a competitive salary with performance-based quarterly bonuses, comprehensive benefits, and additional perks upon confirmation.
Professional Culture - Thrive in a structured, supportive, and growth-oriented workplace that values technical excellence, collaboration, and continuous learning.