Enable job alerts via email!

Site Reliability Engineer

Techno Facts Solutions

New Delhi, Gurugram District, Dadri

On-site

INR 10,00,000 - 15,00,000

Full time

Today
Be an early applicant

Job summary

A leading technology solutions provider in New Delhi is seeking a Site Reliability Engineer (SRE) to develop and maintain dashboards, automate processes, and ensure high availability of services. The ideal candidate has strong experience with AppDynamics and Dynatrace, excels in scripting with Java and Python, and has a proven track record in DevOps roles. This position offers an exciting opportunity to work in a collaborative environment focused on continuous improvement.

Qualifications

  • Proven experience in an SRE, Observability, or DevOps role.
  • Strong expertise in AppDynamics or Dynatrace.
  • Hands-on scripting skills in Java and Python.

Responsibilities

  • Develop and maintain unified dashboards for production support health.
  • Identify automation opportunities within the production support ecosystem.
  • Participate in on-call rotations and contribute to incident resolution.

Skills

Monitoring
Automation
Scripting in Java
Scripting in Python
Cloud platforms (AWS/Azure/GCP)
Creating dashboards

Tools

AppDynamics
Dynatrace
ServiceNow (SNOW)
Job description
Role & responsibilities

Key Responsibilities:

  • Develop and maintain unified dashboards that provide a holistic view of production support health, leveraging data derived from ServiceNow (SNOW).
  • Create and manage application health dashboards using observability tools such as AppDynamics, Dynatrace, and other APM (Application Performance Monitoring) solutions.
  • Identify automation opportunities within the production support ecosystem and implement automation workflows to improve efficiency and reduce manual intervention.
  • Design and implement self-healing mechanisms for recurring issues using scripting languages like Java and Python.
  • Integrate logs and monitoring data with IT Service Management (ITSM) tools (e.g., SNOW) for enhanced incident and problem management.
  • Collaborate with cross-functional teams including development, operations, and support to ensure high availability and performance of critical services.
  • Participate in on-call rotations and be a key contributor to incident resolution and root cause analysis.

Key Skills and Experience:

  • Proven experience in an SRE, Observability, or DevOps role with a focus on monitoring and automation.
  • Strong expertise in AppDynamics, Dynatrace, or similar observability platforms.
  • Experience in creating dashboards with actionable insights for technical and business stakeholders.
  • Hands-on scripting skills in Java and Python, especially for automation and self-healing.
  • Familiarity with ServiceNow (SNOW) and other ITSM tools for workflow automation and incident management.
  • Deep understanding of logging and telemetry data integration with enterprise tools.
  • Knowledge of cloud platforms (e.g., AWS, Azure, GCP) and modern CI/CD pipelines is a plus.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.