Adecco is partnering our client, a well-known US Tech MNC.
The Opportunity
- Adecco is partnering with a global tech MNC, and we are looking for a Site Reliability Engineer (SRE) to join their Platform Engineering team.
- This role is focused on maintaining the performance, scalability, and reliability of cloud-native infrastructure while driving automation, operational excellence, and robust incident management practices. You will work closely with cross-functional teams to build resilient systems that support global-scale applications.
- The role will start off as a 12-month contract assignment with potential for extension or conversion, depending on performance and business needs.
The Job:
Reliability Engineering & Observability:
- Define and implement SLIs, SLOs, and error budgets to track and improve system reliability.
- Build and maintain observability stacks using tools such as Prometheus, Grafana, ELK, or Stackdriver.
Cloud & Infrastructure Management:
- Design, deploy, and manage infrastructure on Google Cloud Platform (GCP) or other cloud providers.
- Utilize Infrastructure as Code (IaC) tools such as Terraform, Helm, and GitOps for scalable and repeatable deployments.
Kubernetes & Platform Operations:
- Manage and optimize Google Kubernetes Engine (GKE) clusters for performance, availability, and security.
- Build and maintain APIs and platform tools that support internal operations.
Incident Management & Support:
- Participate in on‑call rotations, providing L2/L3 support for production systems.
- Lead incident response, perform root cause analysis, and document postmortems.
- Collaborate with teams to reduce Mean Time to Resolution (MTTR) and improve incident handling processes.
Automation & Scripting:
- Develop automation tools and scripts in Python, Go, or Bash to eliminate manual tasks and improve operational efficiency.
Collaboration & DevOps Enablement:
- Partner with engineering, QA, and product teams to embed reliability best practices into the development and deployment lifecycle.
The Talent:
Must-Have Skills & Experience:
- 5–10 years of experience in SRE, DevOps, or Infrastructure Engineering roles.
- Strong hands‑on experience with cloud platforms, especially GCP.
- Proficient in Python, Go, or Bash for scripting and automation.
- Deep knowledge of Kubernetes, especially in GKE environments.
- Solid experience with SQL and relational databases.
- Proven track record in defining and managing SLIs/SLOs and reliability metrics.
- Familiarity with RESTful APIs and microservices architecture.
- Strong debugging and troubleshooting skills in distributed systems.
- Excellent interpersonal and communication skills.
Nice-to-Haves:
- Cloud certifications (e.g., GCP Professional Cloud Engineer).
- Experience with incident management tools (e.g., PagerDuty, Opsgenie).
- Exposure to DevOps, CI/CD pipelines, and Agile methodologies.
- Understanding of cloud security and compliance practices.
Next Step:
- Send your updated resume to JiaYi.Lim@adecco.com
- Email Title: Apply – Site Reliability Engineer (SRE)
- Only shortlisted candidates will be contacted.
Lim Jia Yi