Job Search and Career Advice Platform

Enable job alerts via email!

AI Infra SRE: Reliability, Autoscaling & Automation

RAZER (ASIA-PACIFIC) PTE. LTD.

Singapore

On-site

SGD 80,000 - 120,000

Full time

3 days ago
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading tech company in Singapore is seeking a Site Reliability Engineer to ensure the reliability and performance of AI systems. This role involves working closely with software engineers to automate operations and enhance observability in cloud environments. Major responsibilities include managing AI model-servers and optimizing for availability. A Bachelor's or Master's in computer science or related field is required, along with 4+ years of SRE or DevOps experience. Familiarity with AWS, Docker, and scripting is essential.

Qualifications

  • 4+ years of relevant experience in SRE, DevOps, infrastructure engineering, or cloud operations.
  • Strong knowledge in Web Technologies such as HTTP, REST, SSL.
  • Comfortable with Linux and Docker administration.

Responsibilities

  • Administer cloud-scale environments for AI model APIs and services.
  • Design fault-tolerant cloud architectures for AI workloads.
  • Build automated self-recovery systems for high availability.

Skills

Site Reliability Engineering
DevOps
Infrastructure Engineering
Cloud Operations
Linux Administration
Docker
AWS
CI/CD
Python
Bash Scripting

Education

Bachelor's or Master’s degree in computer science, AI or similar discipline

Tools

Terraform
Jenkins
NGINX
AWS ECS
Kubernetes
Git
mySQL
noSQL
Job description
A leading tech company in Singapore is seeking a Site Reliability Engineer to ensure the reliability and performance of AI systems. This role involves working closely with software engineers to automate operations and enhance observability in cloud environments. Major responsibilities include managing AI model-servers and optimizing for availability. A Bachelor's or Master's in computer science or related field is required, along with 4+ years of SRE or DevOps experience. Familiarity with AWS, Docker, and scripting is essential.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.