Job Search and Career Advice Platform

Enable job alerts via email!

Site Reliability Engineer

NTT Data Singapore

Singapore

On-site

SGD 80,000 - 120,000

Full time

Yesterday
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading technology services company in Singapore is looking for a Site Reliability Engineer to join their Observability team. The candidate should have at least 5 years of experience and a strong understanding of building and maintaining observability infrastructures. Responsibilities include designing and maintaining monitoring and logging systems, onboarding applications, and providing incident support. The role requires proficiency in scripting languages and experience with various observability tools. This is a 12-month renewable contract position.

Qualifications

  • 5+ years of experience as an SRE or in a similar role focused on observability.
  • Strong understanding of distributed systems and microservices architectures.
  • Excellent communication and collaboration skills.

Responsibilities

  • Design, build, and maintain observability infrastructure.
  • Onboard applications to observability platform.
  • Provide support during incidents and ensure quick resolutions.
  • Automate tasks related to observability for efficiency.
  • Develop effective alerting strategies and dashboards.

Skills

Monitoring tools
Logging platforms
Distributed tracing systems
Scripting in Python
Scripting in Go
Scripting in Bash

Education

Bachelor's degree in computer science or related field

Tools

Prometheus
Grafana
Elasticsearch
Jaeger
Fluentd
Datadog
Dynatrace
Bash
Job description
Role: Site Reliability Engineer - 12 months Renewable contract
Experience: Minimum of 5 years
Location: Changi Business Park
Summary

We are seeking a highly motivated and experienced Site Reliability Engineer (SRE) to join our growing Observability team. The ideal candidate will have a strong background in building and maintaining robust observability environments, including monitoring, logging, and tracing systems. This role will focus on the design, implementation, and support of our observability infrastructure, ensuring the seamless onboarding of applications and providing critical support during incidents.

Responsibilities
  • Observability Environment Management: Design, build, and maintain our observability infrastructure, including monitoring tools, logging platforms, and distributed tracing systems (e.g., Prometheus, Grafana, Elasticsearch, etc.). This includes capacity planning, performance tuning, and ensuring high availability.
  • Application Onboarding: Work with development teams to onboard applications to our observability platform, providing guidance on instrumentation best practices and ensuring data quality. This includes creating and maintaining documentation and training materials.
  • Incident Support: Provide timely and effective support during incidents, leveraging observability data to diagnose and resolve issues quickly. This includes contributing to post-incident reviews and implementing preventative measures.
  • Automation: Automate repetitive tasks and processes related to observability, improving efficiency and reducing manual effort. This may involve scripting, developing tools, or integrating with CI/CD pipelines.
  • Alerting and Monitoring: Develop and maintain effective alerting strategies, ensuring appropriate escalation procedures and minimizing noise. This includes creating dashboards and reports to visualize system health and performance.
Qualifications
  • Bachelors degree in computer science or a related field, or equivalent experience.
  • 5+ years of experience as an SRE or in a similar role with a focus on observability.
  • Strong understanding of distributed systems and microservices architectures.
  • Experience with any monitoring, logging, and tracing tools (e.g., Prometheus, Grafana, Jaeger, Elasticsearch, Fluentd, Datadog, Dynatrace, etc.).
  • Proficiency in scripting languages such as Python, Go, or Bash.
  • Strong problem-solving and analytical skills.
  • Excellent communication and collaboration skills.
Bonus Points
  • Experience with cloud platforms.
  • Experience with infrastructure-as-code tools (e.g., Terraform, Ansible)
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.