Job Search and Career Advice Platform

Enable job alerts via email!

Site Reliability Engineer

AirAsia

Kuala Lumpur

On-site

MYR 80,000 - 120,000

Full time

Today
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A prominent aviation group in Kuala Lumpur is looking for a Site Reliability Engineer. In this role, you will manage Kubernetes infrastructure, ensure system uptime, and monitor performance using Grafana and Prometheus. Ideal candidates have proven experience with cloud platforms and CI/CD pipelines, along with scripting skills in Bash and Python. Strong problem-solving skills and a proactive attitude are essential for success in this dynamic environment.

Qualifications

  • Proven experience managing Kubernetes infrastructure.
  • Understanding of API Gateways (Apigee, Kong).
  • Practical experience with cloud platforms, preferably GCP.

Responsibilities

  • Manage and maintain Kubernetes infrastructure to ensure system uptime.
  • Monitor and analyze system performance using Grafana and Prometheus.
  • Develop and maintain automation scripts using Bash and PowerShell.

Skills

Kubernetes management
GitLab and CI/CD pipelines
API Gateways
Bash scripting
Python scripting
Google Cloud Platform

Tools

Grafana
Prometheus
Terraform
Ansible
Job description

Position Title: Site Reliability Engineer (SRE)

Department: Group ICT – Infrastructure

Division: AirAsia Aviation Group

Location: RedQ

About the Department

Group ICT – Infrastructure, AirAsia Aviation

We architect and govern the core technological framework that empowers AirAsia's business and operational objectives. Our team is dedicated to delivering highly resilient and scalable infrastructure services, ensuring operational continuity and providing strategic support across the entire aviation group.

Key Responsibilities
  • Manage and maintain Kubernetes infrastructure (preferably Google Kubernetes Engine – GKE) to ensure system uptime, stability, and resilience.
  • Monitor, analyze & manage system performance using Grafana and Prometheus.
  • Administer and manage GitLab, including version control, CI/CD pipelines, and integrations.
  • Implement automation and configuration management using scripting.
  • Develop and maintain automation scripts using Bash and PowerShell.
  • Manage and support cloud environments (preferably Google Cloud Platform – GCP).
  • Conduct system debugging, troubleshooting, and performance optimization.
  • Collaborate with internal teams to ensure service reliability, scalability, and operational efficiency.
Qualifications
Must Have
  • Proven experience managing Kubernetes infrastructure (preferably GKE).
  • Experience managing GitLab and CI/CD pipelines.
  • Understanding of API Gateways (Apigee, Kong).
  • Proficiency in Bash, PowerShell, and Python scripting.
  • Practical experience with cloud platforms (GCP preferred).
  • Exposure to AI tools (Gemini, Cursor, GPT, etc.).
  • At least 2 years of experience.
Good to Have
  • Familiarity with Cloudflare services.
  • Hands-on experience with monitoring tools such as Grafana and Prometheus.
  • Experience with Terraform for Infrastructure as Code (IaC).
  • Strong knowledge of Ansible for automation and configuration management.
  • Hands-on experience with Helm in Kubernetes environments.
Personal Attributes
  • Analytical and detail-oriented with strong problem-solving skills.
  • Proactive and self-driven with the ability to work under minimal supervision.
  • Strong sense of ownership and accountability.
  • Committed to continuous learning and process improvement.
  • Excellent debugging and troubleshooting skills.
  • Strong communication and teamwork abilities.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.