Enable job alerts via email!

Datadog L3 Engineer

Weekday AI

Singapore

On-site

SGD 80,000 - 100,000

Full time

Today

Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A technology consultancy seeks a full-time Datadog L3 Engineer based in Singapore. This role involves designing and implementing observability solutions for complex systems. The ideal candidate will have over 5 years of experience in monitoring, expertise in Datadog, and skills in Terraform. It requires collaboration with technical teams and a proactive approach to problem-solving. Join us to enhance system visibility and reliability in a dynamic environment.

Qualifications

5+ years of hands-on experience in monitoring, observability, or SRE roles.
Strong expertise with Datadog, including logs, metrics, dashboards, and RUM.
Proven experience using Terraform for infrastructure management.
Solid knowledge of Docker and container-based environments.
Strong understanding of ITIL processes.

Responsibilities

Design and maintain observability solutions using Datadog.
Act as an L3 escalation point for monitoring issues.
Implement and optimize log management pipelines.
Lead infrastructure and application monitoring setup.
Build and manage monitoring infrastructure as code.
Support Docker-based platforms.
Integrate Datadog with CI/CD pipelines.
Define observability standards with teams.
Maintain documentation and operational guides.
Evaluate system performance and recommend improvements.
Mentor junior engineers on observability strategies.

Skills

Monitoring

Observability

Site Reliability Engineering

Datadog

Terraform

Docker

ITIL

Cloud Platforms

This role is for one of the Weekday's clients

Min Experience: 5 years

Location: Singapur

JobType: full-time

As a Datadog L3 Engineer, you will play a critical role in designing, implementing, and operating advanced observability solutions for complex, large-scale technology environments. Based in Singapore, this full-time role is ideal for a highly skilled professional with deep hands-on experience in monitoring, logging, metrics, and real-user monitoring (RUM). You will act as a subject matter expert for Datadog, supporting mission-critical systems, driving operational excellence, and ensuring high availability, performance, and reliability across infrastructure and applications. This role requires strong collaboration with engineering, DevOps, and operations teams, along with a solid understanding of ITIL practices and modern cloud-native tooling.

Key Responsibilities

Design, configure, and maintain end-to-end observability solutions using Datadog, including logs, metrics, traces, and RUM for distributed systems
Act as an L3 escalation point for complex monitoring, performance, and availability issues, performing deep root cause analysis and remediation
Implement and optimize log management pipelines, dashboards, alerts, and service-level indicators (SLIs/SLOs) to improve system visibility and reliability
Lead the setup and tuning of infrastructure and application monitoring across containerized and cloud environments
Build and manage monitoring infrastructure as code using Terraform, ensuring consistency, scalability, and repeatability
Support Docker-based platforms by monitoring container health, performance, and resource utilization
Integrate Datadog with CI/CD pipelines and cloud services to enable proactive detection of issues
Collaborate with DevOps, SRE, and application teams to define observability standards and best practices
Ensure adherence to ITIL processes for incident, problem, and change management
Create and maintain detailed documentation, runbooks, and operational guides
Continuously evaluate system performance trends and recommend improvements to enhance stability and user experience
Mentor junior engineers and provide technical guidance on observability and monitoring strategies

What Makes You a Great Fit

At least 5 years of hands-on experience in monitoring, observability, or site reliability engineering roles
Strong expertise with Datadog, including logs, metrics, dashboards, alerts, and Real User Monitoring (RUM)
Proven experience using Terraform to manage infrastructure and monitoring configurations
Solid hands-on knowledge of Docker and container-based environments
Strong understanding of ITIL processes and experience working in structured operational environments
Ability to troubleshoot complex, large-scale production issues with a methodical and analytical approach
Experience working with cloud platforms and modern DevOps toolchains
Excellent communication skills, with the ability to collaborate across technical and non-technical teams
A proactive mindset with a strong focus on automation, reliability, and continuous improvement
Comfortable working in fast-paced, high-availability environments with ownership and accountability

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Top companies

Popular jobs