Job Search and Career Advice Platform

Enable job alerts via email!

Platform Engineer (Cloud SRE Ops)

Assurity Trusted Solutions Pte Ltd

Singapore

On-site

SGD 80,000 - 100,000

Full time

Today
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading technology firm in Singapore is seeking an experienced Infrastructure Engineer to join their Digital Resiliency Engineering team. You will build centralized services for observability and automation while ensuring the reliability of mission-critical systems. The ideal candidate will have over 6 years of experience in tech operations, a strong background in DevOps, and a passion for automating solutions. This role involves collaboration, technical leadership, and performance optimization in a dynamic environment.

Benefits

Learning culture and growth opportunities
Annual Leave Benefits
Family Care Leave
Birthday Leave
Benefits for Contract Staff similar to Permanent Employees

Qualifications

  • 6+ years of experience in technology operations as an Infrastructure Engineer or Site Reliability Engineer.
  • Expertise in building and operating automated monitoring and incident detection systems.
  • Experience leading highly complex technical projects with multiple dependencies.
  • Proficient in building and managing highly available and scalable IT infrastructure.
  • Ability to communicate complex concepts clearly to different audiences.

Responsibilities

  • Build KPIs like SLI and SLO for critical Government services.
  • Provide operational support and engineering for large-scale systems.
  • Gather metrics and logs for capacity planning and performance tuning.
  • Build automation for services, infrastructure, and applications.
  • Measure and optimize system performance for continuous improvement.

Skills

DevOps
Infrastructure engineering
Site Reliability Engineering
Automation solutions
Agile development
Problem-solving skills
Communication skills

Tools

Python
PowerShell
Ruby
SaltStack
Puppet
Terraform
Ansible
Job description

In Digital Resiliency Engineering (DRE), we combine software and systems engineering to build and operate large-scale and distributed systems designed and/or built by the Singapore Government. We ensure Government services are reliable, meets expected performance and satisfy customer needs.

If you are someone with strong DevOps, Infrastructure engineering and/or SRE background, have experience operating mission critical production technology infrastructure at scale, and are looking for opportunities to work with a team of practitioners and leading industry experts, we welcome you to join us.

In this role, you will build central services for observability and automation of infrastructure services. You will be part of a rotation with other engineers in providing rapid response to major incidents impacting critical Government Services. You will provide technical leadership for the team and work closely with technical leads to operate highly available solutions. You will also provide guidance to other team member on managing availability and performance of mission critical services, building automation and monitoring solutions to prevent problem recurrence, and building automated responses for non-exceptional service conditions.

You will also manage execution of project priorities, deadlines and deliverables. You will also lead designs of major components, systems and features to improve availability, scalability, latency and efficiency of services design and built by the Government.

Key Responsibilities
  • Build Service Level Indicators (SLI), Service Level Objective (SLO), Error Budgets, and Post-mortem Incident processes.
  • As part of an on-call roster, ensure reliability and performance of critical Government Services. Provide operational support and engineering for large-scale and distributed systems to drive incidents resolution effectively.
  • Gather and analyse metrics and logs from Operating Systems and/or applications for capacity planning, performance tuning and fault isolation.
  • Build automation to manage services, infrastructure, and/or applications.
  • Improve reliability and quality of services using proactive monitoring.
  • Measure and optimize system performance, with continuous improvement and pushing SRE practice forward.
  • Build SRE playbook for the Whole-of-Government to leverage as reference for SRE.
  • Identify potential and emerging technologies relevant to innovation for the Government.
  • Work in a cross-functional service team consisting of software engineers, infrastructure engineers, DevOps, and other specialists.
Requirements
  • 6+ years of experience in technology operations as an Infrastructure Engineer or Site Reliability Engineer - with experience operating large-scale mission critical production systems.
  • Expertise in building and operating automated monitoring and incident detection systems, creating runbooks and running incident management processes.
  • Expertise in designing automation solutions using provisioning tools, continuous integration tools (CI/CD), and scripting languages.
  • Experience leading highly complex technical projects with multiple dependencies and stakeholders
  • Knowledgeable and experienced in working within an Agile development environment, focusing on dynamic and rapid quality delivery.
  • Proficient in building and managing highly available and scalable IT infrastructure and/or application, with knowledge in Container and Virtualization technologies.
  • Proficiency in Python, PowerShell, or Ruby.
  • Proficiency with Infrastructure as Code (IaC) tools such as SaltStack, Puppet, Terraform, or Ansible.
  • Able to work independently and deliver results within specified deadlines.
  • Ability to prioritize work and strong problem-solving skills.
  • Good to have communicate skills, both verbally and in writing to users, vendors and management.
  • Ability to communicate complex interaction concepts clearly and persuasively across different audience and varies levels in GovTech.

Join us and discover a meaningful and exciting career with Assurity Trusted Solutions!

The remuneration package will commensurate with your qualifications and experience. Interested applicants, please click "Apply Now".

We thank you for your interest and please note that only shortlisted candidates will be notified.

By submitting your application, you agree that your personal data may be collected, used and disclosed by Assurity Trusted Solutions Pte. Ltd. (ATS), GovTech and their service providers and agents in accordance with ATS’s privacy statement which can be found at https://www.assurity.sg/privacy.html or such other successor site.

Benefits
  • A wholly-owned subsidiary of GovTech.
  • We promote a learning culture and encourage you to grow and learn.
  • Annual Leave Benefits with additional perks such as Family Care and Birthday Leave.
  • Contract Staff enjoys the same benefits as Permanent Employees.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.