Job Search and Career Advice Platform

Enable job alerts via email!

AI Platform Operations Engineer (Ref 26288)

Jobline Resources Pte Ltd

Singapore

On-site

SGD 50,000 - 70,000

Full time

Today
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading technology services provider in Singapore is seeking a Cloud Operations Specialist to monitor and optimize Azure AI cloud platforms. You will support incident responses, implement disaster recovery strategies, and ensure compliance with security policies. Ideal candidates will have a Bachelor's degree in Computer Science or Engineering, and 1-2 years of experience in cloud operations with expertise in Azure. This role offers a dynamic environment focused on continuous improvement and innovation.

Qualifications

  • 1-2 years of experience in cloud administration and/or operations.
  • Expertise in Azure Monitor, Log Analytics, and Application Insights.
  • Familiarity with AI/ML infrastructure and operational demands.

Responsibilities

  • Perform availability monitoring and outage detection of Azure AI cloud platform.
  • Support incident response and implement disaster recovery strategies.
  • Drive continuous improvement in platform operations through automation.

Skills

Azure operations and monitoring services
Infrastructure-as-code (Terraform, Bicep, ARM)
Automation scripting (PowerShell, Python)
Problem-solving and communication skills
Experience in cloud administration and operations

Education

Bachelor’s degree in Computer Science or Engineering
Job description
Responsibilities
  • Perform availability monitoring, outage detection, and performance optimization of Azure AI cloud platform
  • Support incident response, root cause analysis, and implement disaster recovery strategies to ensure business continuity
  • Support security audits, compliance reporting, and ensure alignment with Singtel policies, regulatory frameworks and industry best practices
  • Collaborate with other developer teams to integrate monitoring, automation, and security best practices into AI/ML workflows
  • Drive continuous improvement in platform operations through automation, observability, and operational excellence initiatives
Requirements
  • Bachelor’s degree in Computer Science, Engineering, or a related field
  • 1-2 years of experience in cloud administration and/or operations.
  • Expertise in Azure operations and monitoring services including Azure Monitor, Log Analytics, Application Insights
  • Proficiency in infrastructure-as-code (Terraform, Bicep, ARM) and automation scripting (PowerShell, Python)
  • Familiarity with AI/ML infrastructure (AKS, GPU VMs, data pipelines, model hosting) and their operational demands
  • Excellent problem-solving, communication, and leadership skills, especially in high-pressure incident scenarios
  • Forward thinking ability to identify possible failure scenarios and formulate effective response plans
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.