Job Search and Career Advice Platform

Enable job alerts via email!

Observability, Automation & AI Ops Engineer - MetLife HACK4JOB

MetLife

Kuala Lumpur

On-site

MYR 80,000 - 120,000

Full time

Today
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading global insurance company seeks an Observability, Automation & AI Ops Engineer in Kuala Lumpur, Malaysia. This mid-senior level position involves designing and optimizing monitoring, automation, and AI-driven operations solutions. Candidates should have expertise in observability platforms, automation tools, and cloud technologies. Responsibilities include leading technical projects and mentoring junior staff. Join our innovative team focused on digital transformation through AI and automation.

Qualifications

  • Experience levels: Associate (0-2 years), Engineer (2-5 years), Senior (5+ years).
  • Business proficiency in English is required; proficiency in Japanese is a bonus.

Responsibilities

  • Design, deploy, and manage observability platforms for IT services.
  • Implement automation solutions for infrastructure provisioning and operational workflows.
  • Mentor junior engineers and lead cross-functional project teams.

Skills

Proficiency in observability platforms (Elastic, Splunk, Prometheus, Grafana, OpenTelemetry)
Strong experience with automation tools (Ansible, Terraform, CI/CD, scripting languages)
Familiarity with AIOps platforms and AI/ML frameworks (Scikit‑learn, TensorFlow, PyTorch)
Experience with cloud platforms (AWS, Azure, GCP) and container orchestration (Kubernetes)
Excellent troubleshooting, analytical, and communication skills
Ability to lead, mentor, and manage technical teams

Tools

Elastic
Ansible
Terraform
AIOps platforms (Moogsoft, Dynatrace, DataDog, Elastic)
Cloud platforms (AWS, Azure, GCP)
Kubernetes
Job description

MetLife – Kuala Lumpur, Malaysia

Observability, Automation & AI Ops Engineer

The Observability, Automation & AI Ops Engineer is responsible for designing, implementing, and optimizing advanced monitoring, automation, and AI-driven operations solutions across MetLife’s hybrid cloud and on-premises environments. This role ensures high availability, reliability, and efficiency of IT services by leveraging modern observability platforms, automation frameworks, and artificial intelligence for proactive incident management and continuous improvement.

Key Responsibilities
Observability Engineering
  • Design, deploy, and manage observability platforms (Elastic, Splunk, Prometheus, Grafana, OpenTelemetry) for end‑to‑end visibility of applications, infrastructure, and business services.
  • Develop and maintain telemetry pipelines for logs, metrics, traces, and events.
  • Build dashboards and automated alerting systems with AI‑powered anomaly detection.
  • Collaborate with DevOps, SRE, and application teams to integrate observability into CI/CD pipelines and cloud‑native architectures.
  • Analyze system health, identify trends, and drive data‑driven decisions for performance optimization and reliability.
Automation Engineering
  • Design, implement, and maintain automation solutions for infrastructure provisioning, configuration management, and operational workflows (Ansible, Terraform, CI/CD tools).
  • Develop self‑healing scripts and intelligent runbooks for automated incident response and remediation.
  • Integrate automation with monitoring and ITSM tools to streamline operations and reduce manual intervention.
  • Lead or participate in automation projects to improve efficiency, reduce errors, and support business agility.
  • Stay current with emerging automation technologies and best practices.
  • Implement and maintain AI‑driven systems for real‑time monitoring, predictive analytics, and automated root cause analysis.
  • Develop and train machine learning models using operational data for anomaly detection and forecasting.
  • Deploy and manage AIOps platforms (Moogsoft, Dynatrace, DataDog, Elastic) to enable proactive incident management and self‑healing capabilities.
  • Collaborate with IT, DevOps, and Data Science teams to integrate AI/ML into IT operations and service management.
  • Monitor and optimize AI model performance, ensuring reliability and continuous improvement.
Technical Leadership & Collaboration
  • (Senior Level) Mentor junior engineers, provide technical guidance, and lead cross‑functional project teams.
  • Drive adoption of observability, automation, and AI Ops best practices across the organization.
  • Participate in technology evaluations, pilots, and rollouts of new solutions.
Qualifications & Skills
Experience
  • Associate: 0–2 years in observability, automation, or IT operations.
  • Engineer: 2–5 years relevant experience.
  • Senior: 5+ years with demonstrated technical and/or team leadership.
Skills
  • Proficiency in observability platforms (Elastic, Splunk, Prometheus, Grafana, OpenTelemetry).
  • Strong experience with automation tools (Ansible, Terraform, CI/CD, scripting languages).
  • Familiarity with AIOps platforms and AI/ML frameworks (Scikit‑learn, TensorFlow, PyTorch).
  • Experience with cloud platforms (AWS, Azure, GCP) and container orchestration (Kubernetes).
  • Excellent troubleshooting, analytical, and communication skills.
  • (Senior Level) Ability to lead, mentor, and manage technical teams.
Preferred Certifications
  • Relevant certifications in observability, automation, cloud, or AI/ML platforms are a plus.
  • ITIL v4
Language Requirements
  • Business proficiency in English.
  • Proficiency in Japanese is an added bonus.
Why This Role Matters

This role is critical to MetLife’s digital transformation, enabling proactive, data‑driven IT operations, reducing downtime, and accelerating innovation through automation and AI.

The application for this hackathon is open to individuals from all countries. The job opportunities are based in Kuala Lumpur, Malaysia.

Ready to innovate and showcase your skills? Join the MetLife Hack4Job event today—click Apply and secure your spot!

Additional Information
  • Seniority level: Mid‑Senior level
  • Employment type: Full‑time
  • Job function: Information Technology
  • Industries: Banking and Financial Services
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.