Enable job alerts via email!

Site Reliability Engineer

Discovered MENA

United Arab Emirates

On-site

AED 120,000 - 180,000

Full time

30+ days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An established industry player is seeking a Site Reliability Engineer to enhance their tech team in Dubai. This role involves architecting and implementing high-performance AI and data infrastructure, utilizing cutting-edge cloud technologies and automation tools. You will collaborate with AI researchers and developers to ensure seamless integration of infrastructure with project needs. This innovative firm offers a diverse work environment, dedicated to making a significant impact across the UAE. If you're passionate about site reliability and eager to work with advanced technologies, this opportunity is perfect for you.

Qualifications

  • 3-5 years experience in site reliability engineering or similar role.
  • Strong knowledge of cloud platforms and container orchestration.

Responsibilities

  • Architect and oversee scalable AI and data infrastructure across cloud and on-prem environments.
  • Develop CI/CD pipelines to accelerate deployment of AI models.

Skills

Cloud Platforms (AWS, GCP, Azure)
Container Orchestration (Kubernetes, Docker)
Automation Tools (Terraform, Ansible)
Machine Learning Frameworks (TensorFlow, PyTorch)
Data Processing Tools (Apache Spark)
Incident Management
Security Protocols

Education

Bachelor's Degree in Computer Science
Master's Degree in Software Engineering

Tools

Terraform
Ansible
Kubernetes
Docker

Job description

Site Reliability Engineer (SRE)

Location: Dubai

Duration: Permanent

We're currently partnered with a leading technology consultancy who are scaling their tech team. They offer a diverse work environment that provide services in the UAE impacting millions of lives. We're currently helping them search for a Site Reliability Engineer to join their ever growing team.

Responsibilities:

  • Architect, implement, and oversee scalable, high-performance AI and data infrastructure across cloud (AWS) and on-prem environments.
  • Utilise automation tools (e.g., Terraform, Ansible) for provisioning, monitoring, and infrastructure optimisation.
  • Design robust monitoring, alerting, and logging solutions to detect and mitigate potential failures before they impact operations.
  • Develop and maintain seamless CI/CD pipelines to accelerate the deployment of AI models and data-driven applications.
  • Optimise workflows to enhance efficiency, reduce deployment friction, and maintain system stability.
  • Partner with AI researchers, data engineers, and developers to align infrastructure with project needs.
  • Act as a bridge between AI, data, and infrastructure teams, ensuring smooth communication and technical alignment.
  • Rapidly diagnose and resolve system incidents, conducting thorough root-cause analyses to prevent future issues.
  • Establish and refine disaster recovery frameworks to safeguard AI and data assets.
  • Implement stringent security protocols to protect AI and data infrastructure, ensuring compliance with industry regulations.
  • Perform regular security evaluations, proactively addressing vulnerabilities.
  • Identify opportunities to improve system scalability, efficiency, and resilience.
  • Stay ahead of emerging trends in AI infrastructure, site reliability engineering, and cloud technologies.

Qualifications & skills:

  • Bachelor's or Master's degree in Computer Science, Software Engineering, or a related field.
  • 3-5 years experience in a similar role
  • Experience with on-premise and cloud platforms (AWS, GCP, Azure) and container orchestration (Kubernetes, Docker).
  • Experience with AI and data-specific infrastructure (e.g., GPU clusters, data lakes)
  • Understanding of machine learning frameworks and data processing tools (e.g., TensorFlow, PyTorch, Apache Spark).
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.