Enable job alerts via email!

Lead AI Platform Operations Engineer- #AIDA

Singtel Group

Singapore

On-site

SGD 90,000 - 120,000

Full time

Today
Be an early applicant

Job summary

A leading telecommunications company in Singapore is seeking a Lead AI Platform Operations Engineer to manage the Azure-based AI cloud platform. This role involves ensuring high availability, cybersecurity operations, and disaster recovery processes. The ideal candidate will hold a Bachelor’s degree and have extensive experience in cloud administration, particularly with Azure. Join us in driving AI transformation and empowering growth in this dynamic field.

Qualifications

  • 6 years of experience in cloud administration and/or operations.
  • Deep expertise in Azure operations and monitoring services.
  • Hands-on experience with cloud security operations.

Responsibilities

  • Ensure high availability and reliability of the Azure AI cloud platform.
  • Design disaster recovery and business continuity processes.
  • Oversee cybersecurity operations and compliance.

Skills

Cloud administration
Azure operations
Incident management
Disaster recovery design
Cloud security operations
Automation scripting
Problem-solving
Leadership

Education

Bachelor’s degree in Computer Science, Engineering or related field

Tools

Azure Monitor
Log Analytics
Application Insights
Terraform
PowerShell
Job description

Select how often (in days) to receive an alert:

Lead AI Platform Operations Engineer- #AIDA

To lead the next phase of our AI evolution, we’ve launched a new business unitAIDAArtificial Intelligence & Data Analytics– a strategic engine driving our transformation designed to scale our AI ambitions with precision and purpose.This marks apivotal shift in how we operate, innovate, and serve to embed intelligence into every layer of our business.

AtSingtel, this is more than a technology upgrade. It’s astrategic transformationthat redefines how value is created across the enterprise core—augmenting human capabilitiesand unlocking entirely new potential. It is a transformation journey by aligningpeople, platforms, and processesunder one cohesive strategy. Our mission is to buildAI literacy, and foster a culture whereintelligence empowers people.

We welcome you to join uson a transformational journey that’s reshaping the telecommunications industry — and redefining what’s possible with AI at its core.Grow with usin a workplace that championsinnovation, embracesagility, and putshuman potentialat the heart of everything we do.

Be a Part of Something BIG!

  • Responsible for ensuring the high availability, reliability, and performance of our Azure-based AI cloud platform
  • Lead proactive monitoring, outage detection, and incident response to minimize downtime and operational risk
  • Design and maintain disaster recovery and business continuity processes to safeguard critical AI workloads
  • Oversee cybersecurity operations, including vulnerability management, audits, and compliance with security standards for AIDA’s AI platform
  • Collaborate closely with MLOps, LLMOps, and engineering teams to integrate automation, observability, and security best practices into platform operation

Make an Impact by:

  • Lead availability monitoring, outage detection, and performance optimization of our Azure AI cloud platform
  • Manage incident response, root cause analysis, and implement disaster recovery strategies to ensure business continuity
  • Oversee cybersecurity operations including vulnerability management, threat detection, and access control enforcement
  • Handle security audits, compliance reporting, and ensure alignment with Singtel policies, regulatory frameworks and industry best practices
  • Collaborate with other developer teams to integrate monitoring, automation, and security best practices into AI/ML workflows
  • Drive continuous improvement in platform operations through automation, observability, and operational excellence initiatives
  • Lead AIDA AI platform operations function and coordinate distribution of work within team

Skills for Success:

  • Bachelor’s degree in Computer Science, Engineering, or a related field
  • 6 years of experience in cloud administration and/or operations
  • Deep expertise in Azure operations and monitoring services including Azure Monitor, Log Analytics, Application Insights
  • Strong background in incident management, SRE practices, and disaster recovery design
  • Hands-on experience with cloud security operations: IAM, SIEM/SOAR, vulnerability management, firewalls, endpoint detection
  • Proficiency in infrastructure-as-code (Terraform, Bicep, ARM) and automation scripting (PowerShell, Python)
  • Familiarity with AI/ML infrastructure (AKS, GPU VMs, data pipelines, model hosting) and their operational demands
  • Knowledge of security compliance frameworks (ISO 27001, CIS, NIST)
  • Excellent problem-solving, communication, and leadership skills, especially in high-pressure incident scenarios
  • Forward thinking ability to identify possible failure scenarios and formulate effective response plans

Are you ready to say hello to BIG Possibilities?

Take the leap with Singtel to unlock new opportunities and accelerate your growth. Apply now and start your empowering career!

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.