Site Reliability Engineer
ELLIOTT MOSS CONSULTING PTE. LTD.
Singapore
On-site
SGD 70,000 - 90,000
Full time
Job summary
A technology consulting firm based in Singapore seeks a specialist to design and implement hybrid cloud observability and monitoring solutions. The ideal candidate will have strong experience with tools like Splunk and Prometheus, developing proactive alerting systems and managing centralized logging pipelines. Strong analytical skills and collaboration with infrastructure teams are crucial. This position offers an opportunity to work in a dynamic environment focused on performance tuning and compliance.
Qualifications
- Strong experience with log observability and monitoring solutions.
- Hands-on experience with creating metrics visualizations.
- Familiarity with multi-environment data integrations.
Responsibilities
- Design and implement hybrid cloud observability and monitoring solutions.
- Develop and manage alerting systems for proactive issue detection.
- Collaborate with teams to track SLIs, SLOs, and SLAs.
Skills
Splunk
Prometheus
Grafana
Amazon CloudWatch
ELK/EFK stacks
Custom dashboards
Centralized logging pipelines
S3 log archiving
Key Responsibilities
- Design and implement hybrid cloud observability and monitoring solutions across multiple environments.
- Develop and manage alerting systems, metrics, and dashboards for proactive issue detection.
- Integrate logging pipelines for structured and unstructured data sources.
- Implement log archiving strategies (e.g., S3) for compliance and cost optimization.
- Perform advanced log analysis and correlation to support root cause investigation and performance tuning.
- Collaborate with infrastructure and development teams to define and track SLIs, SLOs, and SLAs.
Required Skills
- Strong experience with Splunk, Prometheus, Grafana, and Amazon CloudWatch.
- Proficiency with ELK/EFK stacks (Elasticsearch, Logstash/Fluentd, Kibana).
- Hands-on experience creating custom dashboards, alerts, and metrics visualizations.
- Experience building and managing centralized logging pipelines across distributed systems.
- Familiarity with S3 log archiving and multi-environment data integrations.