Senior System Reliability Engineer

Be among the first applicants.
Client of Talentmate
Dubai
AED 120,000 - 180,000
Be among the first applicants.
3 days ago
Job description

The Senior System Reliability Engineer will be responsible for maintaining and enhancing the performance, availability, and resilience of complex IT systems and infrastructure within UAE-based organizations. Working in high-demand environments such as finance, telecom, government, or healthcare, this role focuses on automation, incident response, system monitoring, and capacity planning. The engineer ensures that all solutions comply with UAE data security regulations and business continuity requirements while supporting scalable and reliable operations.

Responsibilities:

  1. Designing and implementing system reliability strategies to ensure high availability, fault tolerance, and efficient incident management.
  2. Monitoring infrastructure and application performance using observability tools and proactively resolving system anomalies.
  3. Developing automated solutions for deployment, scaling, recovery, and configuration to reduce manual intervention and increase uptime.
  4. Leading root cause analysis (RCA) of critical incidents and implementing long-term fixes to prevent recurrence.
  5. Collaborating with DevOps, IT, and security teams to ensure system integrity, compliance, and security as per UAE data governance standards.
  6. Establishing and maintaining Service Level Objectives (SLOs) and Service Level Indicators (SLIs) in alignment with business needs.
  7. Conducting capacity planning and performance tuning to meet growing system demands and user loads.
  8. Participating in disaster recovery planning and ensuring readiness for business continuity events in accordance with UAE regulatory expectations.

Requirements:

  1. Bachelor's degree in Computer Science, Information Technology, or a related field.
  2. Minimum of 5-7 years of experience in system reliability, infrastructure engineering, or DevOps, with at least 2 years in a senior role.
  3. Strong expertise in Linux/Unix system administration, containerization (Docker/Kubernetes), and cloud platforms (AWS, Azure, or GCP).
  4. Proficiency in automation tools such as Terraform, Ansible, and scripting languages like Python or Bash.
  5. Experience with observability and monitoring tools such as Prometheus, Grafana, Datadog, or ELK stack.
  6. Solid understanding of networking, load balancing, CI/CD pipelines, and high-availability architectures.
  7. Knowledge of UAE-specific cybersecurity, data residency, and compliance standards is highly preferred.
  8. Strong problem-solving, communication, and cross-functional collaboration skills.
  9. Fluency in English is required; Arabic is a plus for local stakeholder interaction.
Get a free, confidential resume review.
Select file or drag and drop it
Avatar
Free online coaching
Improve your chances of getting that interview invitation!
Be the first to explore new Senior System Reliability Engineer jobs in Dubai