Enable job alerts via email!

Site Reliability Engineer, Principal

AIA Hong Kong and Macau

Kuala Lumpur

On-site

MYR 150,000 - 200,000

Full time

30+ days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An established industry player is seeking a Site Reliability Engineer to ensure the reliability of cloud applications. In this pivotal role, you will supervise application systems, establish automated detections, and collaborate with development teams to enhance services. Your expertise in monitoring, performance tuning, and scripting will be crucial in formulating preventive actions and improving operational efficiency. This is a fantastic opportunity to join a forward-thinking organization committed to creating a healthier, more sustainable future through innovative digital solutions. If you're passionate about technology and eager to make a difference, this role is perfect for you.

Qualifications

  • Experience in programming with Java and scripting in Shell, Bash, or Powershell.
  • Knowledge of REST APIs and performance tuning for cloud applications.

Responsibilities

  • Ensure reliability and availability of cloud application systems.
  • Set up monitoring and build alerts for operational issues.

Skills

Java 8 or above
Scripting (Shell, Bash, Powershell)
Performance tuning
REST API knowledge
MySQL and MSSQL optimization
Linux (RHEL or SUSE)
Git
ITIL in Agile environment
Python programming

Education

Tertiary qualification in Computer Science

Tools

Azure DevOps
Grafana
ELK
Dynatrace
Atlassian tools (Jira, Bitbucket, Confluence)
Terraform
Ansible

Job description

Site Reliability Engineer, Principal page is loaded

Site Reliability Engineer, Principal

Apply locations Kuala Lumpur, MY-AIA Malaysia time type Full time posted on Posted 30+ Days Ago job requisition id JR-45612

At AIA we’ve started an exciting movement to create a healthier, more sustainable future for everyone.

As pioneering innovators for over 100 years, we’re now transforming our organisation to be faster, simpler and more connected. Because we want to be even better equipped to develop digital solutions and experiences that help more people live Healthier, Longer, Better Lives.

To get there, we need people with tech/digital/analytics expertise and passion to help develop positive, sustainable change through digitally enhanced experiences that will impact the lives of millions of people and create a healthier future for everyone.

If you believe in developing a better tomorrow, read on.

About the Role

The Site Reliability Engineer (SRE) is responsible for ensuring our cloud application systems are reliable and available to users. The SRE will supervise application systems and establish automated detections, root cause analysis, and formulate preventive actions. They will gather and analyze metrics from operating systems as well as applications to assist in performance tuning and fault finding. They will partner with development teams to improve services.

Functional Duties:

  • Set up and maintain monitoring of infrastructure and application
  • Build alerts and auto recovery for various operational issues
  • Capture and analyze metrics from operating systems as well as applications
  • Advise in performance tuning and fault finding
  • Partner with development teams to improve services
  • Assist in formulating preventive actions where possible, lead potential failure scenarios studies and formulate automated recovery methods
  • Comfortable with working on new tools e.g., Azure DevOps, Grafana, ELK, Dynatrace

People Management Duties:

  • Train and mentor other consultants or teammates on your specialties
  • Be the advisor toward applications and assist application team establish recovery processes

Requirements:

  • Tertiary qualification in Computer Science or any other relevant education
  • Programming Languages: Java 8 or above (must have)
  • Experience in developing and optimizing stored procedures for MySQL and MSSQL databases
  • OS: Linux(RHEL or SUSE) or Windows Server
  • Scripting (must have any one of them): Shell, Bash, Powershell
  • Knowledge in open-source distributed version control system, git
  • Sound knowledge of how REST API works
  • Experience in Atlassian tools (e.g., Jira, Bitbucket, Confluence)
  • Familiarity with Azure Cloud services
  • Working experience with ITIL in Agile environment

Good to have:

  • Experience with Python programming language
  • Experience with containerization (Docker, AKS, ACR, EKS, ECS)
  • Experience in CICD with Azure DevOps
  • Experience in Dashboard development with Grafana, Azure Monitor, or Dynatrace
  • Experience in infrastructure management with Terraform or Ansible
  • Experience with Azure or AWS cloud certification would be an added advantage

Build a career with us as we help our customers and the community live Healthier, Longer, Better Lives.

You must provide all requested information, including Personal Data, to be considered for this career opportunity. Failure to provide such information may influence the processing and outcome of your application. You are responsible for ensuring that the information you submit is accurate and up-to-date.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.