Job Search and Career Advice Platform

¡Activa las notificaciones laborales por email!

SRE Lead: Multi-Cloud Reliability & Automation

Bebeecloud

Xico

Híbrido

MXN 1,437,000 - 1,797,000

Jornada completa

Ayer
Sé de los primeros/as/es en solicitar esta vacante

Genera un currículum adaptado en cuestión de minutos

Consigue la entrevista y gana más. Más información

Descripción de la vacante

A technology company based in Mexico is seeking an experienced SRE Lead to enhance reliability, scalability, and automation across multi-cloud platforms. The role involves defining key performance indicators, leading CI/CD design, and ensuring high availability of critical systems. The ideal candidate will have strong skills in Datadog, AWS, Azure, and MongoDB, along with expertise in engineering practices to optimize service delivery. This position also emphasizes collaboration with various teams to integrate reliability throughout the software development lifecycle.

Formación

  • Strong experience with Datadog, PagerDuty, Azure, AWS, and MongoDB.
  • Proficiency in scripting (Python, Bash) and Infrastructure as Code (Terraform, ARM templates).
  • Hands-on experience with containerization (Docker, Kubernetes).
  • Deep understanding of SLIs / SLOs, error budgets, and reliability engineering practices.
  • Expertise in CI/CD tools (Azure DevOps, Jenkins, GitHub Actions).
  • Strong automation mindset and experience with configuration management tools (Ansible, Chef, or similar).

Responsabilidades

  • Design and implement SRE best practices for monitoring, alerting, and incident response.
  • Define and track SLIs, SLOs, and SLAs to improve system reliability.
  • Lead CI/CD pipeline design and optimization for multi-cloud environments.
  • Automate infrastructure provisioning and deployments using Infrastructure as Code (IaC).
  • Own incident response processes leveraging PagerDuty and Datadog.
  • Conduct post-mortems and implement preventive measures.
  • Architect and manage hybrid cloud environments.
  • Optimize cost, performance, and security across cloud services.
  • Ensure high availability and performance of MongoDB clusters.
  • Implement backup, recovery, and disaster recovery strategies.
  • Mentor SRE / DevOps engineers and foster a culture of reliability and automation.
  • Collaborate with development, QA, and product teams.

Conocimientos

Datadog
PagerDuty
Azure
AWS
MongoDB
Python
Bash
Terraform
Docker
Kubernetes
Azure DevOps
Jenkins
GitHub Actions
Ansible
Chef
Descripción del empleo
A technology company based in Mexico is seeking an experienced SRE Lead to enhance reliability, scalability, and automation across multi-cloud platforms. The role involves defining key performance indicators, leading CI/CD design, and ensuring high availability of critical systems. The ideal candidate will have strong skills in Datadog, AWS, Azure, and MongoDB, along with expertise in engineering practices to optimize service delivery. This position also emphasizes collaboration with various teams to integrate reliability throughout the software development lifecycle.
Consigue la evaluación confidencial y gratuita de tu currículum.
o arrastra un archivo en formato PDF, DOC, DOCX, ODT o PAGES de hasta 5 MB.