Job Search and Career Advice Platform

Enable job alerts via email!

SITE RELIABILITY ENGINEER

LANDI INTERNATIONAL (SINGAPORE) PTE. LTD.

Singapore

On-site

SGD 60,000 - 80,000

Full time

4 days ago
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A global technology firm in Singapore is seeking a Site Reliability Engineer to ensure the operation and performance of its platform infrastructure. The role combines technical execution, proactive monitoring, and incident response. Candidates must have at least 3 years of relevant experience and strong communication skills in English. Preferred skills include cloud platforms, Linux systems, and scripting languages. The position involves collaboration with cross-functional teams, focusing on platform stability and ongoing growth.

Qualifications

  • Minimum 3+ years of experience in a Site Reliability Engineer, DevOps Engineer, or similar role.
  • Strong verbal and written communication skills in English.
  • Ability to work independently while collaborating effectively within a distributed team.

Responsibilities

  • Build, operate, and maintain platform infrastructure across multiple environments.
  • Ensure platform availability, reliability, and scalability in collaboration with R&D teams.
  • Provide operational support for production systems and participate in incident response and root cause analysis.
  • Implement and maintain monitoring, alerting, and observability solutions.
  • Analyze system performance and reliability metrics to identify improvement opportunities.

Skills

Cloud platforms (e.g. AWS, Azure)
Linux/Unix-based distributed systems
Programming or scripting languages (e.g. Python, Bash, Go)
Monitoring and observability tools (e.g. Prometheus, Grafana, Zabbix)
Configuration management tools (e.g. Ansible, Chef, Puppet)
SQL databases (e.g. PostgreSQL, MySQL)
Load balancing and reverse proxy technologies (e.g. Nginx)
CI/CD tools (e.g. Jenkins, GitLab)
Containerization and orchestration technologies (e.g. Docker, Kubernetes)
Job description
Job Overview

As aSite Reliability Engineerat LANDI Global, you will be responsible for theoperation, reliability, and performanceof the company’s platform infrastructure. You will work closely with R&D and cross-functional teams to ensure high availability, scalability, and operational excellence across multiple environments.

This role combines hands-on technical execution with proactive monitoring, incident response, and continuous improvement of systems and processes. You will play an important role in maintaining platform stability while supporting ongoing growth and new client onboarding.

Key Responsibilities
Platform Reliability & Operations
  • Build, operate, and maintain platform infrastructure across multiple environments.
  • Ensure platform availability, reliability, and scalability in collaboration with R&D teams.
  • Provide operational support for production systems and participate in incident response and root cause analysis.
  • Participate in a 24/7 on-call / standby rotation to support critical platform operations.
Monitoring, Performance & Resilience
  • Implement and maintain monitoring, alerting, and observability solutions to ensure timely detection and resolution of issues.
  • Analyze system performance, reliability metrics, and logs to identify improvement opportunities.Contribute to cost optimization and capacity planning initiatives.
  • Support and maintain Disaster Recovery (DR) and business continuity plans.
Automation & DevOps Practices
  • Contribute to automation, CI/CD pipelines, and deployment processes to improve efficiency and reduce operational risk.
  • Support automated testing and release processes to ensure stable and repeatable deployments.
  • Assist in managing change management and incident reporting processes.
Environment & Client Support
  • Support environment provisioning and deployments for new client onboarding and platform expansions.
  • Collaborate with internal teams to ensure smooth rollout of infrastructure and application changes.
Experience & Qualifications
  • Minimum 3+ years of experiencein a Site Reliability Engineer, DevOps Engineer, or similar role.
  • Strong verbal and written communication skills in English.
  • Ability to work independently while collaborating effectively within a distributed team.
Preferred Technical Skills

Candidates should have hands-on experience in several of the following areas:

  • Cloud platforms (e.g. AWS, Azure)
  • Linux/Unix-based distributed systems
  • Programming or scripting languages (e.g. Python, Bash, Go)
  • Monitoring and observability tools (e.g. Prometheus, Grafana, Zabbix)
  • Configuration management tools (e.g. Ansible, Chef, Puppet)
  • SQL databases (e.g. PostgreSQL, MySQL)
  • Load balancing and reverse proxy technologies (e.g. Nginx)
  • CI/CD tools (e.g. Jenkins, GitLab)
  • Containerization and orchestration technologies (e.g. Docker, Kubernetes)
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.