Enable job alerts via email!

SRE - Site Reliability Engineer Opening For Abu Dhabi - Happiestminds

InnovaziT (A Happiest Minds company)

Abu Dhabi

On-site

AED 250,000 - 300,000

Full time

Today

Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A financial technology company seeks a Site Reliability Engineer (SRE) to ensure smooth operation of digital banking services. Responsibilities include defining SLIs/SLOs, incident management, and automating operational processes. The ideal candidate should have over 5 years of experience in SRE or DevOps, a degree in Computer Science, and proficiency in various modern technologies. Join us to enhance reliability and drive automation in our team.

Qualifications

5+ years of experience in SRE or DevOps roles.
Strong experience with Linux environments.
Proficiency with Kubernetes and container orchestration.

Responsibilities

Define and implement SLIs, SLOs, and error budgets.
Lead incident management and root-cause analysis.
Automate operational processes and enhance observability.

Skills

Automation skills

Strong analytical mindset

Experience in cross-functional collaboration

Problem-solving skills

Calm communicator under pressure

Education

Bachelor’s degree in Computer Science

Tools

Terraform

Kubernetes

AWS

Dynatrace

Grafana

Prometheus

ELK stack

Warm Greetings from Happiestminds Technologies,

Please find below JD and kindly go-through company profile link in Signature and do apply if you are interested

Site Reliability Engineer (SRE)

From designing fault-tolerant architectures to leading incident responses, youll have the freedom to shape how we deliver stable, secure, and high-performance banking services.

About the Role

Were looking for a talented Site Reliability Engineer (SRE) to keep our systems running smoothly, reliably, and at scale. Through smart automation, deep observability, and a calm head in a crisis, youll help us balance speed, compliance and stability, working alongside DevOps, Cloud, Quality Engineering, and Product teams to drive continuous improvements in performance, security, and resilience.

Youll play a key role in enhancing reliability, accelerating delivery, and ensuring seamless digital experiences for ADCB customers.

This role reports directly to our Lead SRE / Tribe Executive Manager.

What You Will Be Doing

Define and implement SLIs / SLOs and error budgets for business‑critical digital banking services.
Build actionable observability (metrics, logs, traces, dashboards, and alerts) using Dynatrace, Prometheus, Grafana, and ELK, while reducing alert fatigue.
Leverage AI‑driven insights and anomaly detection (Dynatrace Davis AI or equivalent AIOps platform) to proactively predict and resolve reliability issues before impact.
Lead incident management from on‑call triage and root‑cause analysis to blameless postmortems with actionable follow‑ups.
Improve deployment safety with robust rollout / rollback strategies, canary and blue‑green deployments, and production readiness reviews.
Support and optimize microservices‑based architectures, ensuring service reliability, scalability and inter‑service resilience.
Conduct capacity planning, performance tuning and resilience testing, optimizing for both reliability and cost efficiency.
Automate operational toil — from runbooks and remediation scripts to proactive health checks and self‑healing workflows.
Collaborate with DevOps to embed reliability gates and validations into CI / CD pipelines (GitHub Actions, Jenkins, GitLab CI / CD or Azure DevOps).
Own and evolve the observability and AIOps stack, driving intelligent automation and predictive alerting capabilities.
Maintain high‑quality documentation, playbooks and operational standards across environments.
Ensure operational compliance and security alignment with internal controls and regulatory standards.
Analyze system performance, availability and cost data to continually optimize operations.
Provide reliability support and escalation guidance for critical production systems during major incidents.

Skills

Experience and Qualifications

5+ years of experience in SRE or DevOps roles, building and managing large‑scale, high‑availability systems across banking, fintech, e‑commerce, or other data‑intensive digital ecosystems.
Bachelor’s degree in Computer Science or equivalent technical experience.
Strong experience with Linux environments and performance troubleshooting.
Proven expertise in Terraform and Infrastructure as Code (IaC) methodologies.
Proficiency with Kubernetes and container orchestration in microservices environments.
Hands‑on experience with AWS (preferred); exposure to Azure or GCP is an advantage.
Deep knowledge of Dynatrace (AIOps, Davis AI), Prometheus, Grafana, and the ELK stack.
Experience implementing AI / ML‑driven reliability or automation solutions (AIOps, anomaly detection, predictive alerting).
Practical understanding of CI / CD pipelines (GitHub Actions, Jenkins, GitLab CI / CD or Azure DevOps).
Experience with Kafka, RabbitMQ, Redis, Aurora, and RDS databases.
Strong scripting or programming skills in Python, Bash, or Go.
The Ideal Candidate
Organized, structured, and meticulous in approach.
Experienced in cross‑functional collaboration and working with distributed teams.
Strong analytical mindset with excellent troubleshooting skills for complex production systems.
Calm and composed communicator under pressure, capable of leading during high‑impact incidents.
Proactive problem‑solver who anticipates issues and drives preventive improvements.
Passionate about AI‑driven automation, observability, and reliability engineering.
Continuously learning, keeping up‑to‑date with cloud‑native, microservices, and SRE best practices.
A collaborative and adaptable team player who thrives in a fast‑paced, regulated environment and is passionate about building reliable, scalable systems that empower digital banking innovation.

Warm Regards

Devipriya Gunasekaran

Talent Acquisition Team

Bangalore

Happiest Minds Technologies

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Top cities

Top companies

Popular jobs