Enable job alerts via email!

Lead Site Reliability Engineer

AIQ

Abu Dhabi

On-site

AED 200,000 - 300,000

Full time

30 days ago

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Start fresh or import an existing resume

Job summary

AIQ is seeking a Lead Site Reliability Engineer to enhance reliability and performance of systems. This role involves leading SRE initiatives, mentoring team members, and building scalable solutions while collaborating with multiple teams. The ideal candidate should have extensive experience in SRE principles and expertise in cloud platforms.

Qualifications

  • 12+ years of experience in relevant roles.
  • 1+ year leading a team.
  • Expertise in Kubernetes, CI/CD, infrastructure-as-code.

Responsibilities

  • Architect and lead reliability strategies.
  • Define and enforce SLOs, SLIs.
  • Lead incident response and root cause analysis.

Skills

Kubernetes
CI/CD
Infrastructure-as-Code
Cloud Platforms
Distributed Systems
Incident Management

Job description

About The Role

AIQ is looking for a Lead Site Reliability Engineer to drive reliability, performance, and scalability across our infrastructure. This role will lead SRE initiatives, mentor team members, and collaborate with engineering and product teams to build robust systems that can scale globally.

Responsibilities

  • Architect and lead reliability strategies across services and environments.
  • Define and enforce SLOs, SLIs, and error budgets with engineering leadership.
  • Lead incident response and root cause analysis.
  • Implement automation to reduce toil and improve system resilience.
  • Manage capacity planning, traffic forecasting, and cost optimization.
  • Mentor junior and senior SREs in technical and process excellence.
  • Collaborate with MLOPS, DevSecOps, and CloudOps teams to enforce best practices.
  • Champion observability, metrics-driven decisions, and platform maturity.

Qualifications

  • 12+ years of experience in relevant roles.
  • At least 1 year experience in leading a team.
  • Expertise in Kubernetes, CI/CD (e.g., GitLab, Argo), and infrastructure-as-code (Terraform/Helm).
  • Strong experience in cloud platforms (Azure, AWS, or GCP).
  • Proven background in SRE principles, SLIs/SLOs, and reliability-focused engineering.
  • Programming proficiency in Python or Shell (nice to have).
  • Deep understanding of distributed systems, networking, and incident management.

Disclaimer: Naukrigulf.com is only a platform to connect jobseekers & employers. Applicants are advised to verify the legitimacy of the employer independently. We do NOT endorse any requests for money payments and strictly advise against sharing personal or bank-related information. For more security tips, visit Security Advice. If you suspect fraud or malpractice, email us at abuse@naukrigulf.com

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.