Enable job alerts via email!

Site Reliability Engineer

TP-LINK CORPORATION PTE. LTD.

Singapore

On-site

SGD 60,000 - 80,000

Full time

3 days ago
Be an early applicant

Job summary

A leading technology company in Singapore is seeking a Site Reliability Engineer to implement and operate Microservices on Kubernetes cloud platforms. The ideal candidate has a Bachelor's degree in Computer Science and 1+ year of experience in SRE, with proficiency in programming languages like Java, Python, and Bash. Responsibilities include deploying services to Multi-Cloud Platforms and analyzing production risks. This role requires strong problem-solving skills and the ability to mentor team members.

Qualifications

  • 1+ year of experience as a Site Reliability Engineer.
  • Hands-on experience in SRE, DevOps, and cloud security best practices.
  • Experience in developing and maintaining technical documentation.

Responsibilities

  • Implement and operate Microservices on Kubernetes cloud platforms.
  • Deploy services to the Multi-Cloud Platform.
  • Analyze and resolve production risks.

Skills

Proficient in programming languages like Java
Proficient in Python
Proficient in Bash
Proficient in PowerShell
Strong problem-solving skills
Experience in cloud operations
Mentoring and training skills

Education

Bachelor's degree in Computer Science
Bachelor's degree in Information Technology
Related field degree

Tools

Kubernetes
AWS
OCI
Azure
GCP

Job description

Responsibilities:

  • Serve as technical SME for implementing and operating Microservices on Kubernetes cloud-based platforms.
  • Collaborate with the Cloud Technical Development and DevOps teams to deploy services to the Multi-Cloud Platform.
  • Performing Load Tests and Chaos Tests to ensure the scalability and reliability of microservices.
  • Build Observability for Microservices and cloud platforms like AWS, OCI, Azure, and GCP.
  • Write and Execute the Disaster recovery plans in collaboration with the Development and DevOps team.
  • Analyze and resolve production risks caused by insufficient resources, such as node groups, CPU, memory, HPA scheduling, JVM pre-warming, etc.
  • Write and maintain scripts for automation using languages like Python, Go, or Bash.
  • Define and maintain the KPIs (SLA/SLO/SLI) for all cloud microservices with development teams to better understand the business.
  • Create and maintain technical documentation, including architecture diagrams, design documents, and standard operating procedures.
  • Guarantee adherence to security and compliance standards, including ISO27001, SOC2, and GDPR.
  • Lead incident response efforts to troubleshoot and resolve production issues quickly.
  • Perform post-incident analysis to identify root causes and potential workarounds/solutions.
  • Assist with product/technology selection, including implementation of POCs
  • Be fluid and open to change and evolving processes and tools
  • Help to mentor and train less senior members of the team
  • Ability to be part of On-call rotation and provide support after work hours and on weekends.
  • Other duties as assigned

Requirements:

  • Bachelor's degree in Computer Science, Information Technology, or a related field.
  • 1+ year of experience as a Site Reliability Engineer.
  • Proficiency in programming and scripting languages like Java, Python, Bash, or PowerShell.
  • Hands-on experience in SRE, DevOps, cloud operations, and cloud security best practices.
  • Strong knowledge of security technologies, including Identity and access management, Network security, Application security, and Data protection.
  • Strong problem-solving and analytical skills, with the ability to work independently and as part of a team.
  • Experience in developing and maintaining technical documentation and implementing compliance requirements

Additional Skills (Preferred):

  • Expert-level cloud certifications include AWS Solutions Architect, Professional, Azure Solutions
  • Architect Expert, and GCP Professional Cloud Architect.
  • Experience with container orchestration technologies (e.g., Kubernetes).
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.