Enable job alerts via email!

Site Reliability Engineer

AirAsia

Kuala Lumpur

On-site

MYR 80,000 - 120,000

Full time

Today

Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A prominent aviation group in Kuala Lumpur is looking for a Site Reliability Engineer. In this role, you will manage Kubernetes infrastructure, ensure system uptime, and monitor performance using Grafana and Prometheus. Ideal candidates have proven experience with cloud platforms and CI/CD pipelines, along with scripting skills in Bash and Python. Strong problem-solving skills and a proactive attitude are essential for success in this dynamic environment.

Qualifications

Proven experience managing Kubernetes infrastructure.
Understanding of API Gateways (Apigee, Kong).
Practical experience with cloud platforms, preferably GCP.

Responsibilities

Manage and maintain Kubernetes infrastructure to ensure system uptime.
Monitor and analyze system performance using Grafana and Prometheus.
Develop and maintain automation scripts using Bash and PowerShell.

Skills

Kubernetes management

GitLab and CI/CD pipelines

API Gateways

Bash scripting

Python scripting

Google Cloud Platform

Tools

Grafana

Prometheus

Terraform

Ansible

Position Title: Site Reliability Engineer (SRE)

Department: Group ICT – Infrastructure

Division: AirAsia Aviation Group

Location: RedQ

About the Department

Group ICT – Infrastructure, AirAsia Aviation

We architect and govern the core technological framework that empowers AirAsia's business and operational objectives. Our team is dedicated to delivering highly resilient and scalable infrastructure services, ensuring operational continuity and providing strategic support across the entire aviation group.

Key Responsibilities

Manage and maintain Kubernetes infrastructure (preferably Google Kubernetes Engine – GKE) to ensure system uptime, stability, and resilience.
Monitor, analyze & manage system performance using Grafana and Prometheus.
Administer and manage GitLab, including version control, CI/CD pipelines, and integrations.
Implement automation and configuration management using scripting.
Develop and maintain automation scripts using Bash and PowerShell.
Manage and support cloud environments (preferably Google Cloud Platform – GCP).
Conduct system debugging, troubleshooting, and performance optimization.
Collaborate with internal teams to ensure service reliability, scalability, and operational efficiency.

Qualifications

Must Have

Proven experience managing Kubernetes infrastructure (preferably GKE).
Experience managing GitLab and CI/CD pipelines.
Understanding of API Gateways (Apigee, Kong).
Proficiency in Bash, PowerShell, and Python scripting.
Practical experience with cloud platforms (GCP preferred).
Exposure to AI tools (Gemini, Cursor, GPT, etc.).
At least 2 years of experience.

Good to Have

Familiarity with Cloudflare services.
Hands-on experience with monitoring tools such as Grafana and Prometheus.
Experience with Terraform for Infrastructure as Code (IaC).
Strong knowledge of Ansible for automation and configuration management.
Hands-on experience with Helm in Kubernetes environments.

Personal Attributes

Analytical and detail-oriented with strong problem-solving skills.
Proactive and self-driven with the ability to work under minimal supervision.
Strong sense of ownership and accountability.
Committed to continuous learning and process improvement.
Excellent debugging and troubleshooting skills.
Strong communication and teamwork abilities.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Top cities

Top companies

Popular jobs