Enable job alerts via email!

Site Reliability Engineer - Scalable Infra for AI

SECOND TALENT SG PTE. LTD.

Singapore

On-site

SGD 60,000 - 100,000

Full time

Today
Be an early applicant

Job summary

A dynamic AI startup in Asia is seeking an Infrastructure Manager to oversee the reliability and scalability of their global systems. The ideal candidate will have a Bachelor's degree in Computer Science and at least 3 years of experience in SRE or DevOps. Responsibilities include managing infrastructure clusters, building CI/CD pipelines, and troubleshooting critical incidents. Strong skills in Linux, Kubernetes, and cloud platforms are essential. Join us in making a difference in the AI landscape.

Qualifications

  • 3+ years in SRE, DevOps, or system operations.
  • Self-driven with a strong problem-solving mindset.

Responsibilities

  • Manage container and open-source infrastructure clusters.
  • Build and maintain CI/CD pipelines, monitoring, and logging tools.
  • Troubleshoot and resolve critical incidents rapidly.
  • Enhance system availability through architectural improvements.
  • Drive automation across all levels of operations.
  • Work closely with engineering to champion infrastructure best practices.
  • Participate in 24/7 support rotations.

Skills

Linux
Shell/Python scripting
System-level performance tuning
Cloud platforms (AWS, GCP, Azure)
Kubernetes
Docker
GitLab CI
ArgoCD
MySQL
Redis
Kafka
Nginx
Elasticsearch

Education

Bachelor's degree in Computer Science or equivalent
Job description
Overview

Be the infrastructure hero for one of Asia’s most dynamic AI startups. This is an opportunity to own reliability, scalability, and efficiency across global systems.

Key Responsibilities
  • Manage container and open-source infrastructure clusters.
  • Build and maintain CI/CD pipelines, monitoring, and logging tools.
  • Troubleshoot and resolve critical incidents rapidly.
  • Enhance system availability through architectural improvements.
  • Drive automation across all levels of operations.
  • Work closely with engineering to champion infrastructure best practices.
  • Participate in 24/7 support rotations.
Requirements
  • Bachelor's degree in Computer Science or equivalent.
  • 3+ years in SRE, DevOps, or system operations.
  • Strong with Linux, Shell/Python scripting, and system-level performance tuning.
  • Hands-on with cloud platforms (AWS, GCP, Azure).
  • Advanced knowledge of Kubernetes, Docker, GitLab CI, and ArgoCD.
  • Familiarity with managing MySQL, Redis, Kafka, Nginx, Elasticsearch, JVM apps.
  • Self-driven with a strong problem-solving mindset.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.