Lead Reliability Engineer Enterprise SaaS Platform (Dubai)
Core Mission
This role drives reliability and scalability for a global SaaS platform serving enterprise clients. The ideal candidate will balance hands-on technical leadership with team enablement, ensuring high availability, security, and performance across a cloud-native stack while fostering a culture of ownership and operational excellence.
Key Responsibilities
Team Leadership
- Manage and mentor a growing team of reliability engineers and DevOps specialists, emphasizing psychological safety, professional growth, and collaborative problem-solving.
- Define processes for planning, prioritization, and delivery in a fast-paced environment, balancing velocity with long-term system health.
- Champion reliability principles across engineering teams, advocating for resilience strategies, incident preparedness, and blameless postmortems.
Technical Execution
- Architect and optimize an AWS-native infrastructure (EKS, Aurora, Terraform) to support scalability, automation, and observability.
- Lead CI/CD enhancements, release automation, and developer tooling to accelerate deployment cycles without compromising stability.
- Advance monitoring maturity through improved dashboards, alerts (e.g., CloudWatch, Prometheus), and SLO-driven instrumentation to preemptively address risks.
Operational Resilience
- Translate incidents into coaching opportunities, strengthening cross-team operational readiness and response protocols.
- Partner with security teams to conduct audits, vulnerability assessments, and ensure compliance across cloud environments.
- Mitigate technical debt by prioritizing high-impact infrastructure investments and automating repetitive tasks.
Strategic Impact
- Align reliability initiatives with organizational goals, translating long-term vision into actionable engineering roadmaps.
- Optimize vendor relationships (e.g., AWS, New Relic) to balance cost, capability, and innovation.
- Promote a culture of urgency and ownership, encouraging proactive problem-solving and accountability during high-stakes scenarios.
Critical Qualifications
- Proven experience in AWS-based SRE/DevOps roles at scale, ideally supporting B2B SaaS platforms with 10,000+ daily active users.
- Dual expertise as a hands-on engineer and people leader, capable of coding complex solutions while mentoring junior team members.
- Technical fluency in infrastructure-as-code (Terraform), Kubernetes (EKS), observability tooling, and incident management frameworks.
- Mindset fit: Thrives in ambiguous environments, prioritizes impact over perfection, and fosters collaboration across security, QA, and product teams.
Location: Dubai, UAE
Lead - Reliability Engineer in Dubai, United Arab Emirates