Enable job alerts via email!

Expert Observability SME

Marc Ellis

Saudi Arabia

On-site

SAR 120,000 - 200,000

Full time

6 days ago
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Start fresh or import an existing resume

Job summary

Une entreprise dynamique recherche un Observability SME expérimenté pour concevoir, mettre en œuvre et optimiser des solutions d’observabilité. Ce rôle est essentiel pour le monitoring des applications et infrastructures, avec une forte attention sur l'intégration des outils comme ELK, Dynatrace et BMC TrueSight. Les responsabilités incluent la gestion des incidents, la définition de KPI, et l'automatisation des processus de monitoring, tout en collaborant avec des équipes multidisciplinaires.

Qualifications

  • 8+ années d'expérience pratique avec des plateformes de surveillance comme ELK et Dynatrace.
  • Compréhension des environnements cloud et hybrides.
  • Compétences en automatisation avec Python, PowerShell ou Bash.

Responsibilities

  • Concevoir et mettre en œuvre des solutions d'observabilité.
  • Déployer et optimiser des outils de surveillance comme ELK et SolarWinds.
  • Collaborer avec les équipes IT et DevOps pour résoudre les problèmes de performance.

Skills

Expertise en outils de surveillance
Analyse des logs
Automatisation et scripting
Connaissances des réseaux
Gestion des incidents
Communication

Education

Licence en informatique ou domaine connexe

Tools

ELK Stack
Dynatrace
BMC TrueSight
SolarWinds

Job description

Job Purpose:

We are seeking an experienced Observability SME with deep expertise in observability architectures and leading monitoring platforms. This role will be responsible for designing, implementing, and optimizing end-to-end observability solutions for applications, infrastructure, and networks. The ideal candidate will have extensive hands-on experience with platforms such as ELK (Elasticsearch, Logstash, Kibana), Dynatrace, BMC TrueSight, and SolarWinds, ensuring seamless monitoring, alerting, and analytics to enhance IT operations and service reliability.

Key Responsibilities:

  • Observability Strategy & Architecture: Design and implement comprehensive observability solutions to monitor applications, infrastructure, and network performance.
  • Monitoring Tool Implementation & Optimization: Deploy and fine-tune monitoring solutions using ELK, Dynatrace, BMC TrueSight, and SolarWinds.
  • Log Management & Analysis: Establish centralized logging, log parsing, and correlation for improved event detection and troubleshooting.
  • Metrics & Performance Monitoring: Define KPIs, dashboards, and alerts for proactive IT service monitoring.
  • Incident Management & Root Cause Analysis: Collaborate with IT operations, DevOps, and SRE teams to diagnose and resolve performance issues.
  • Automation & Integration: Integrate monitoring tools with ITSM platforms, AIOps solutions, and automation frameworks for enhanced efficiency.
  • Capacity Planning & Optimization: Analyze historical trends and real-time data to optimize resource allocation and performance.
  • Stakeholder Collaboration: Work closely with developers, network engineers, system administrators, and business units to ensure observability best practices are followed.
  • Continuous Improvement: Stay updated on emerging observability technologies and recommend improvements to existing processes and tools

Qualifications:

  • Bachelor's degree in Computer Science, Information Technology, or related field (or equivalent experience).
  • Expertise in Observability & Monitoring Platforms: 8+ Years Hands-on experience with ELK Stack, Dynatrace, BMC TrueSight, SolarWinds, and similar platforms.
  • Strong Knowledge of Infrastructure & Application Monitoring: Experience monitoring cloud, on-premise, and hybrid environments.
  • Experience with Log & Event Correlation: Ability to configure and analyze logs for anomaly detection and security insights.
  • Automation & Scripting: Proficiency in scripting languages such as Python, PowerShell, or Bash for automation.
  • Cloud & DevOps Understanding: Experience with cloud platforms (AWS, Azure, GCP) and CI/CD pipelines.
  • ITIL & Incident Management Exposure: Understanding of ITIL processes and IT service management (ITSM) practices.
  • Networking & Security Awareness: Knowledge of network monitoring, SNMP, and security monitoring practices.
  • Excellent Communication & Documentation Skills: Ability to present findings, create technical documentation, and train teams on observability best practices.

Preferred Qualifications:

  • Certifications in Dynatrace, ELK, BMC TrueSight, or SolarWinds.
  • Experience with AIOps, Machine Learning for Anomaly Detection, or AI-driven Observability.
  • Background in Site Reliability Engineering (SRE) or DevOps.
  • Familiarity with Infrastructure as Code (IaC) tools such as Terraform, Ansible.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.