Enable job alerts via email!

Site Reliability Engineer/ DevOps (Tech MNC)

Adecco

Singapore

On-site

SGD 70,000 - 100,000

Full time

4 days ago

Be an early applicant

Job summary

A leading staffing agency is seeking a Site Reliability Engineer in Singapore to enhance the performance and reliability of cloud-native infrastructure. The successful candidate will manage GKE clusters and implement SLIs/SLOs. Ideal candidates have 5–10 years of experience in SRE roles, strong cloud platform knowledge, and proficiency in scripting languages like Python or Go. This role is a 12-month contract with potential for extension.

Qualifications

5–10 years of experience in SRE, DevOps, or Infrastructure Engineering roles.
Strong hands-on experience with cloud platforms, especially GCP.
Proficient in Python, Go, or Bash for scripting and automation.
Deep knowledge of Kubernetes, especially in GKE environments.
Solid experience with SQL and relational databases.

Responsibilities

Maintain performance, scalability, and reliability of cloud-native infrastructure.
Define and implement SLIs, SLOs, and error budgets.
Build and maintain observability stacks using tools.
Design, deploy, and manage infrastructure on cloud platforms.
Manage and optimize GKE clusters.

Skills

SRE experience

DevOps experience

Cloud platforms

Python scripting

Go scripting

Bash scripting

Kubernetes

SQL

Distributed systems debug

Communication skills

Tools

Google Cloud Platform

Terraform

Helm

Prometheus

Grafana

ELK

PagerDuty

Opsgenie

Adecco is partnering our client, a well-known US Tech MNC.

The Opportunity

Adecco is partnering with a global tech MNC, and we are looking for a Site Reliability Engineer (SRE) to join their Platform Engineering team.
This role is focused on maintaining the performance, scalability, and reliability of cloud-native infrastructure while driving automation, operational excellence, and robust incident management practices. You will work closely with cross-functional teams to build resilient systems that support global-scale applications.
The role will start off as a 12-month contract assignment with potential for extension or conversion, depending on performance and business needs.

The Job:

Reliability Engineering & Observability:

Define and implement SLIs, SLOs, and error budgets to track and improve system reliability.
Build and maintain observability stacks using tools such as Prometheus, Grafana, ELK, or Stackdriver.

Cloud & Infrastructure Management:

Design, deploy, and manage infrastructure on Google Cloud Platform (GCP) or other cloud providers.
Utilize Infrastructure as Code (IaC) tools such as Terraform, Helm, and GitOps for scalable and repeatable deployments.

Kubernetes & Platform Operations:

Manage and optimize Google Kubernetes Engine (GKE) clusters for performance, availability, and security.
Build and maintain APIs and platform tools that support internal operations.

Incident Management & Support:

Participate in on‑call rotations, providing L2/L3 support for production systems.
Lead incident response, perform root cause analysis, and document postmortems.
Collaborate with teams to reduce Mean Time to Resolution (MTTR) and improve incident handling processes.

Automation & Scripting:

Develop automation tools and scripts in Python, Go, or Bash to eliminate manual tasks and improve operational efficiency.

Collaboration & DevOps Enablement:

Partner with engineering, QA, and product teams to embed reliability best practices into the development and deployment lifecycle.

The Talent:

Must-Have Skills & Experience:

5–10 years of experience in SRE, DevOps, or Infrastructure Engineering roles.
Strong hands‑on experience with cloud platforms, especially GCP.
Proficient in Python, Go, or Bash for scripting and automation.
Deep knowledge of Kubernetes, especially in GKE environments.
Solid experience with SQL and relational databases.
Proven track record in defining and managing SLIs/SLOs and reliability metrics.
Familiarity with RESTful APIs and microservices architecture.
Strong debugging and troubleshooting skills in distributed systems.
Excellent interpersonal and communication skills.

Nice-to-Haves:

Cloud certifications (e.g., GCP Professional Cloud Engineer).
Experience with incident management tools (e.g., PagerDuty, Opsgenie).
Exposure to DevOps, CI/CD pipelines, and Agile methodologies.
Understanding of cloud security and compliance practices.

Next Step:

Send your updated resume to JiaYi.Lim@adecco.com
Email Title: Apply – Site Reliability Engineer (SRE)
Only shortlisted candidates will be contacted.

Lim Jia Yi

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.