Job Search and Career Advice Platform

Enable job alerts via email!

Site Reliability Engineer

Second Talent

Singapore

On-site

SGD 60,000 - 90,000

Full time

30+ days ago

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading talent solutions firm in Singapore is seeking an experienced professional for Infrastructure Platform Development. Key responsibilities include designing and enhancing infrastructure operation platforms and ensuring maximum uptime for production services. Candidates should have 2+ years of experience in Systems Operations or DevOps, a Bachelor's degree in a technical field, and proficiency in cloud platforms like AWS and Azure. This role offers a chance to drive automation and process improvements in a dynamic environment.

Qualifications

  • 2+ years of hands-on experience in Systems Operations, DevOps, or Site Reliability Engineering (SRE).
  • Experience with public cloud platforms such as AWS, Azure, or GCP.
  • Strong understanding of large-scale internet architecture and distributed systems.

Responsibilities

  • Design, build, and enhance infrastructure operation platforms.
  • Ensure maximum uptime for production services through proactive monitoring.
  • Lead the development of automated operations and maintenance systems.

Skills

Systems Operations experience
Scripting and automation
Containerization technologies
Infrastructure monitoring tools

Education

Bachelor's degree in Computer Science or related field

Tools

Kubernetes
Docker
AWS
Azure
GCP
Job description
Infrastructure Platform Development
  • Design, build, and enhance infrastructure operation platforms
  • Develop and maintain systems for infrastructure management, CI/CD pipelines, monitoring/alerting, and centralized logging
  • Drive platform standardization and automation initiatives
High Availability & Reliability
  • Ensure maximum uptime for production services through proactive monitoring and incident response
  • Continuously optimize service architecture, deployment strategies, and operational processes
  • Implement and maintain SLA/SLO frameworks and reliability engineering practices
Automation & Process Improvement
  • Lead the development of automated operations and maintenance systems
  • Create self-service tools and workflows to improve team productivity
  • Establish best practices for infrastructure such as code and configuration management
Required Qualifications
Experience & Education
  • 2+ years of hands-on experience in Systems Operations, DevOps, or Site Reliability Engineering (SRE)
  • Bachelor\'s degree in Computer Science, Engineering, or related technical field preferred
  • Experience with public cloud platforms (AWS, Azure, or GCP) is highly valued
  • Strong understanding of large-scale internet architecture and distributed systems
  • Proven experience with infrastructure monitoring, logging, and observability tools
Technical Skills
  • Proficiency in scripting and automation using Shell, Python, or similar languages
  • Strong knowledge of containerization technologies (Kubernetes, Docker)
  • Hands-on experience operating production-grade container clusters and managing CI/CD pipelines
  • Strong familiarity with common infrastructure components: Nginx, MySQL, Redis, Kafka, Elasticsearch
Advanced Networking (Preferred)
  • Experience with Service Mesh architectures, Cilium CNI, and eBPF technologies
  • Understanding network security, load balancing, and traffic management
  • Knowledge of cloud-native networking patterns and best practices
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.