Enable job alerts via email!

Site Reliability Engineer

NTT Data Singapore

Singapore

On-site

SGD 80,000 - 120,000

Full time

Yesterday

Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading technology services company in Singapore is looking for a Site Reliability Engineer to join their Observability team. The candidate should have at least 5 years of experience and a strong understanding of building and maintaining observability infrastructures. Responsibilities include designing and maintaining monitoring and logging systems, onboarding applications, and providing incident support. The role requires proficiency in scripting languages and experience with various observability tools. This is a 12-month renewable contract position.

Qualifications

5+ years of experience as an SRE or in a similar role focused on observability.
Strong understanding of distributed systems and microservices architectures.
Excellent communication and collaboration skills.

Responsibilities

Design, build, and maintain observability infrastructure.
Onboard applications to observability platform.
Provide support during incidents and ensure quick resolutions.
Automate tasks related to observability for efficiency.
Develop effective alerting strategies and dashboards.

Skills

Monitoring tools

Logging platforms

Distributed tracing systems

Scripting in Python

Scripting in Go

Scripting in Bash

Education

Bachelor's degree in computer science or related field

Tools

Prometheus

Grafana

Elasticsearch

Jaeger

Fluentd

Datadog

Dynatrace

Bash

Role: Site Reliability Engineer - 12 months Renewable contract

Experience: Minimum of 5 years

Location: Changi Business Park

Summary

We are seeking a highly motivated and experienced Site Reliability Engineer (SRE) to join our growing Observability team. The ideal candidate will have a strong background in building and maintaining robust observability environments, including monitoring, logging, and tracing systems. This role will focus on the design, implementation, and support of our observability infrastructure, ensuring the seamless onboarding of applications and providing critical support during incidents.

Responsibilities

Observability Environment Management: Design, build, and maintain our observability infrastructure, including monitoring tools, logging platforms, and distributed tracing systems (e.g., Prometheus, Grafana, Elasticsearch, etc.). This includes capacity planning, performance tuning, and ensuring high availability.
Application Onboarding: Work with development teams to onboard applications to our observability platform, providing guidance on instrumentation best practices and ensuring data quality. This includes creating and maintaining documentation and training materials.
Incident Support: Provide timely and effective support during incidents, leveraging observability data to diagnose and resolve issues quickly. This includes contributing to post-incident reviews and implementing preventative measures.
Automation: Automate repetitive tasks and processes related to observability, improving efficiency and reducing manual effort. This may involve scripting, developing tools, or integrating with CI/CD pipelines.
Alerting and Monitoring: Develop and maintain effective alerting strategies, ensuring appropriate escalation procedures and minimizing noise. This includes creating dashboards and reports to visualize system health and performance.

Qualifications

Bachelors degree in computer science or a related field, or equivalent experience.
5+ years of experience as an SRE or in a similar role with a focus on observability.
Strong understanding of distributed systems and microservices architectures.
Experience with any monitoring, logging, and tracing tools (e.g., Prometheus, Grafana, Jaeger, Elasticsearch, Fluentd, Datadog, Dynatrace, etc.).
Proficiency in scripting languages such as Python, Go, or Bash.
Strong problem-solving and analytical skills.
Excellent communication and collaboration skills.

Bonus Points

Experience with cloud platforms.
Experience with infrastructure-as-code tools (e.g., Terraform, Ansible)

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Top companies

Popular jobs