Job Search and Career Advice Platform

¡Activa las notificaciones laborales por email!

Reliability Engineer - Multi-Cloud Specialist

Bebeecloud

Xico

Híbrido

MXN 1,437,000 - 1,797,000

Jornada completa

Hoy
Sé de los primeros/as/es en solicitar esta vacante

Genera un currículum adaptado en cuestión de minutos

Consigue la entrevista y gana más. Más información

Descripción de la vacante

A technology company based in Mexico is seeking an experienced SRE Lead to enhance reliability, scalability, and automation across multi-cloud platforms. The role involves defining key performance indicators, leading CI/CD design, and ensuring high availability of critical systems. The ideal candidate will have strong skills in Datadog, AWS, Azure, and MongoDB, along with expertise in engineering practices to optimize service delivery. This position also emphasizes collaboration with various teams to integrate reliability throughout the software development lifecycle.

Formación

  • Strong experience with Datadog, PagerDuty, Azure, AWS, and MongoDB.
  • Proficiency in scripting (Python, Bash) and Infrastructure as Code (Terraform, ARM templates).
  • Hands-on experience with containerization (Docker, Kubernetes).
  • Deep understanding of SLIs / SLOs, error budgets, and reliability engineering practices.
  • Expertise in CI/CD tools (Azure DevOps, Jenkins, GitHub Actions).
  • Strong automation mindset and experience with configuration management tools (Ansible, Chef, or similar).

Responsabilidades

  • Design and implement SRE best practices for monitoring, alerting, and incident response.
  • Define and track SLIs, SLOs, and SLAs to improve system reliability.
  • Lead CI/CD pipeline design and optimization for multi-cloud environments.
  • Automate infrastructure provisioning and deployments using Infrastructure as Code (IaC).
  • Own incident response processes leveraging PagerDuty and Datadog.
  • Conduct post-mortems and implement preventive measures.
  • Architect and manage hybrid cloud environments.
  • Optimize cost, performance, and security across cloud services.
  • Ensure high availability and performance of MongoDB clusters.
  • Implement backup, recovery, and disaster recovery strategies.
  • Mentor SRE / DevOps engineers and foster a culture of reliability and automation.
  • Collaborate with development, QA, and product teams.

Conocimientos

Datadog
PagerDuty
Azure
AWS
MongoDB
Python
Bash
Terraform
Docker
Kubernetes
Azure DevOps
Jenkins
GitHub Actions
Ansible
Chef
Descripción del empleo
Job Role Summary

Seeking an experienced SRE Lead to drive reliability, scalability, and automation across multi‑cloud and application platforms. Job DescriptionAs a seasoned SRE and DevOps Lead, you will combine leadership, hands‑on engineering, and strategic thinking to ensure high availability and performance of mission‑critical systems.

Key Responsibilities
  • Design and implement SRE best practices for monitoring, alerting, and incident response.
  • Define and track SLIs, SLOs, and SLAs to improve system reliability.
  • Lead CI/CD pipeline design and optimization for multi‑cloud environments (Azure & AWS).
  • Automate infrastructure provisioning and deployments using Infrastructure as Code (IaC).
  • Own incident response processes leveraging PagerDuty and Datadog for alerting and observability.
  • Conduct post‑mortems and implement preventive measures.
  • Architect and manage hybrid cloud environments (Azure, AWS).
  • Optimize cost, performance, and security across cloud services.
  • Ensure high availability and performance of MongoDB clusters.
  • Implement backup, recovery, and disaster recovery strategies.
  • Mentor SRE / DevOps engineers and foster a culture of reliability and automation.
  • Collaborate with development, QA, and product teams to embed reliability into the SDLC.
Required Skills & Qualifications

The ideal candidate should possess the following skills:

  • Strong experience with Datadog, PagerDuty, Azure, AWS, and MongoDB.
  • Proficiency in scripting (Python, Bash) and Infrastructure as Code (Terraform, ARM templates).
  • Hands‑on experience with containerization (Docker, Kubernetes).
  • Deep understanding of SLIs / SLOs, error budgets, and reliability engineering practices.
  • Expertise in CI/CD tools (Azure DevOps, Jenkins, GitHub Actions).
  • Strong automation mindset and experience with configuration management tools (Ansible, Chef, or similar).
Soft Skills

Excellent communication and leadership skills. Ability to work in a fast‑paced, collaborative environment.

Preferred Qualifications
  • Experience in regulated industries (Healthcare, Finance, etc.).
  • AWS Solutions Architect, Azure Administrator, or Datadog Certified Professional.
Consigue la evaluación confidencial y gratuita de tu currículum.
o arrastra un archivo en formato PDF, DOC, DOCX, ODT o PAGES de hasta 5 MB.