The Job Description
- Ensure System Uptime and Reliability: Monitor and maintain cloud-based applications and infrastructure, ensuring minimal downtime and efficient incident response.
- Build and Optimize Monitoring and Alerting Systems: Set up and continuously improve comprehensive monitoring and alerting frameworks to detect and address issues proactively.
- Cloud Infrastructure Management: Manage, optimize, and scale systems on Azure cloud platforms, ensuring high performance and cost-effectiveness.
- Incident Management and Response: Act as the first line of defense in identifying, diagnosing, and resolving technical issues in real-time or escalate them to the appropriate teams.
- Required Skills and Platforms: Experience with cloud platforms (AWS, Azure, GCP), reliability and scalability testing, monitoring tools, incident response, and disaster recovery.
- Tooling and Observability: Leverage technologies such as Grafana for observability and Argo for CI/CD automation, enhancing response capabilities.
- Collaboration: Work closely with cross-functional teams to align on SRE best practices, share insights, and support development and operational goals.
- Language Skills: Fluent spoken and written Arabic and English.
Disclaimer: Naukrigulf.com is a platform connecting jobseekers and employers. Applicants should independently verify the legitimacy of employers. We do NOT endorse requests for money or sharing personal/bank information. For security concerns, contact abuse@naukrigulf.com.
People also searched for Sr. Cloud Reliability Engineer Jobs