Job Search and Career Advice Platform

Enable job alerts via email!

Datadog L3 Engineer

Weekday AI

Singapore

On-site

SGD 80,000 - 100,000

Full time

Today
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A technology consultancy seeks a full-time Datadog L3 Engineer based in Singapore. This role involves designing and implementing observability solutions for complex systems. The ideal candidate will have over 5 years of experience in monitoring, expertise in Datadog, and skills in Terraform. It requires collaboration with technical teams and a proactive approach to problem-solving. Join us to enhance system visibility and reliability in a dynamic environment.

Qualifications

  • 5+ years of hands-on experience in monitoring, observability, or SRE roles.
  • Strong expertise with Datadog, including logs, metrics, dashboards, and RUM.
  • Proven experience using Terraform for infrastructure management.
  • Solid knowledge of Docker and container-based environments.
  • Strong understanding of ITIL processes.

Responsibilities

  • Design and maintain observability solutions using Datadog.
  • Act as an L3 escalation point for monitoring issues.
  • Implement and optimize log management pipelines.
  • Lead infrastructure and application monitoring setup.
  • Build and manage monitoring infrastructure as code.
  • Support Docker-based platforms.
  • Integrate Datadog with CI/CD pipelines.
  • Define observability standards with teams.
  • Maintain documentation and operational guides.
  • Evaluate system performance and recommend improvements.
  • Mentor junior engineers on observability strategies.

Skills

Monitoring
Observability
Site Reliability Engineering
Datadog
Terraform
Docker
ITIL
Cloud Platforms
Job description

This role is for one of the Weekday's clients

Min Experience: 5 years

Location: Singapur

JobType: full-time

As a Datadog L3 Engineer, you will play a critical role in designing, implementing, and operating advanced observability solutions for complex, large-scale technology environments. Based in Singapore, this full-time role is ideal for a highly skilled professional with deep hands-on experience in monitoring, logging, metrics, and real-user monitoring (RUM). You will act as a subject matter expert for Datadog, supporting mission-critical systems, driving operational excellence, and ensuring high availability, performance, and reliability across infrastructure and applications. This role requires strong collaboration with engineering, DevOps, and operations teams, along with a solid understanding of ITIL practices and modern cloud-native tooling.

Key Responsibilities
  • Design, configure, and maintain end-to-end observability solutions using Datadog, including logs, metrics, traces, and RUM for distributed systems
  • Act as an L3 escalation point for complex monitoring, performance, and availability issues, performing deep root cause analysis and remediation
  • Implement and optimize log management pipelines, dashboards, alerts, and service-level indicators (SLIs/SLOs) to improve system visibility and reliability
  • Lead the setup and tuning of infrastructure and application monitoring across containerized and cloud environments
  • Build and manage monitoring infrastructure as code using Terraform, ensuring consistency, scalability, and repeatability
  • Support Docker-based platforms by monitoring container health, performance, and resource utilization
  • Integrate Datadog with CI/CD pipelines and cloud services to enable proactive detection of issues
  • Collaborate with DevOps, SRE, and application teams to define observability standards and best practices
  • Ensure adherence to ITIL processes for incident, problem, and change management
  • Create and maintain detailed documentation, runbooks, and operational guides
  • Continuously evaluate system performance trends and recommend improvements to enhance stability and user experience
  • Mentor junior engineers and provide technical guidance on observability and monitoring strategies
What Makes You a Great Fit
  • At least 5 years of hands-on experience in monitoring, observability, or site reliability engineering roles
  • Strong expertise with Datadog, including logs, metrics, dashboards, alerts, and Real User Monitoring (RUM)
  • Proven experience using Terraform to manage infrastructure and monitoring configurations
  • Solid hands-on knowledge of Docker and container-based environments
  • Strong understanding of ITIL processes and experience working in structured operational environments
  • Ability to troubleshoot complex, large-scale production issues with a methodical and analytical approach
  • Experience working with cloud platforms and modern DevOps toolchains
  • Excellent communication skills, with the ability to collaborate across technical and non-technical teams
  • A proactive mindset with a strong focus on automation, reliability, and continuous improvement
  • Comfortable working in fast-paced, high-availability environments with ownership and accountability
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.