Job Search and Career Advice Platform

Enable job alerts via email!

SRE Engineer

TECHNOPALS CONSULTANTS PTE. LTD.

Singapore

On-site

SGD 60,000 - 90,000

Full time

Today
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading consulting firm based in Singapore is seeking a skilled application monitoring engineer to enhance and manage their monitoring infrastructure. You will work closely with application teams on OpenShift migrations and deployments, troubleshooting issues in Kubernetes. You'll maintain observability practices, focusing on metrics data stores like Prometheus and visualization tools such as Grafana. Ideal candidates will have hands-on experience with monitoring technologies and strong understanding of site reliability engineering principles. Weekend support may be required, compensated with time off.

Qualifications

  • Experience in managing open source-based application monitoring infrastructure.
  • Hands-on experience with Elasticsearch, Prometheus, and Grafana.
  • Understanding of SRE practices and CI/CD pipelines.

Responsibilities

  • Enhance and migrate application monitoring infrastructure.
  • Support migration to OpenShift, perform troubleshooting and deployment.
  • Maintain observability culture within the development community.

Skills

Monitoring application infrastructure
Kubernetes
OpenShift
Prometheus
Grafana
Linux OS troubleshooting
Elasticsearch/Kibana
Visualization tools administration
Observability culture implementation
Linux OS troubleshooting

Tools

Elasticsearch
Grafana
OpenTelemetry
CI/CD pipelines
Job description
Mandatory Skills
  • Maintain open source-based application monitoring infrastructure. Enhance, optimize, and migrate to new solutions if required.
  • Support application teams to migrate to latest OpenShift versions, perform deployment of stateful/stateless apps, and troubleshoot issues in Kubernetes/OpenShift platforms.
  • Work with application developers to implement application instrumentation libraries and frameworks.
  • Maintain metrics data store using TSDBs like Prometheus. Perform administration and tuning like cardinality optimization, resource optimization.
  • Maintain distributing tracing infrastructure like Otel, Jaeger, Zipkin, etc. Perform administrative functions and tuning like sampling strategy. Troubleshoot distributed tracing in microservices.
  • Perform production support activities of enterprise logging platforms like ELK stack, Grafana Loki, etc. Work on Index Lifecycle management in Elastic search.
  • Implementing alerting infrastructure, integrate with PagerDuty, MS teams and any other software which needs alert-based mitigation/action. Assist application support team to define alerting rules for enterprise business apps.
  • Deploy and do administration of visualization tools like Grafana/Elastic. Create dashboarding templates which can be reused, Implement RBAC for the entire userbase.
  • Educate and implement observability culture in dev community. Assist them identifying golden signals, defining SLI, SLO for enterprise applications, calculate error budgets, MTTD, and MTTR.
  • Troubleshoot the infra issues in the observability infrastructure in Linux VMs and Kubernetes PODs, Setup and secure reverse proxies, secure all application endpoints with TLS, enable MFA, LDAPS, OAuth based on requirement.
  • Configure CI/CD pipeline for all the monitoring infrastructure and services. Modify and extend existing pipeline to cater multiple environments/regions.

The person should have hands-on on below key technologies

  • Elasticsearch/Kibana – Cluster Management, Search Optimization
  • Prometheus/Grafana
  • OpenTelemetry
  • Linux OS troubleshooting
  • Kubernetes deployments, CI/CD pipelines
  • Good understanding of SRE practices
Working Hours

Mon to Fri – 9 to 6PM

Sometimes need to support deployment on weekends, and will be given comp off

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.