Enable job alerts via email!

Senior Engineer - Site Reliability

AIQ

Abu Dhabi

On-site

AED 200,000 - 300,000

Full time

5 days ago
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

AIQ, a leading joint venture in Abu Dhabi, seeks a Senior Site Reliability Engineer to enhance system reliability and engage in performance improvements. This role is crucial for maintaining high standards in incident response and system monitoring, focusing on innovative AI solutions in the energy sector.

Qualifications

  • 5-8 years of relevant experience.
  • Hands-on experience with Docker, Kubernetes.
  • Proficiency in Python and Bash.

Responsibilities

  • Maintain and improve monitoring and incident response systems.
  • Identify performance bottlenecks and resolve them proactively.
  • Lead root cause analysis and develop preventive measures.

Skills

Containerized environments
CI/CD pipelines
Scripting languages
Observability tools
Cloud platforms

Education

Bachelor of Business Administration (Management)

Job description

Bachelor of Business Administration (Management)

Nationality: Any Nationality

Vacancy: 1 Vacancy

Job Description
Overview

About AIQ: AIQ is an Abu Dhabi based joint venture between Presight and ADNOC, focusing on developing artificial intelligence technologies for the energy sector. It develops and commercializes AI products to reduce costs and generate revenue for clients, utilizing data, cloud computing, and talented professionals. AIQ offers an innovative environment with access to advanced AI infrastructure, including NVIDIA GPU cloud computing platforms, providing opportunities for professionals to thrive and make impactful contributions.

About The Role

AIQ is seeking a Senior Site Reliability Engineer to lead root cause analysis, improve system reliability, and collaborate with engineering teams to enhance service performance and stability. The role involves leading reliability projects, improving observability, and responding to complex production incidents.

Responsibilities
  • Maintain and improve monitoring, alerting, and incident response systems.
  • Identify and resolve performance bottlenecks and failure points proactively.
  • Contribute to infrastructure automation and deployment pipelines.
  • Promote SLO/SLI adoption in collaboration with engineering teams.
  • Lead root cause analysis and develop preventive measures.
  • Mentor junior engineers and help scale operational excellence.
Qualifications
  • 5-8 years of relevant experience.
  • Experience with containerized environments (Docker, Kubernetes).
  • Hands-on experience with CI/CD pipelines and automation tools.
  • Proficiency in scripting languages such as Python and Bash.
  • Knowledge of observability tools like Prometheus, Grafana, ELK, and Sentry.
  • Familiarity with cloud platforms; Huawei Cloud and Azure preferred.
Job Details

Role Level: Mid-Level

Work Type: Full-Time

Country: United Arab Emirates

City: Abu Dhabi

Company Website: [Insert URL]

Disclaimer

Naukrigulf.com is a platform connecting jobseekers and employers. Applicants should verify the legitimacy of employers independently. We do not endorse any requests for payments or sharing personal/bank information. For security, visit our Security Advice page. Report any fraud to abuse@naukrigulf.com.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.