¡Activa las notificaciones laborales por email!

Reliability Engineer - Multi-Cloud Specialist

Bebeecloud

Xico

Híbrido

MXN 1,437,000 - 1,797,000

Jornada completa

Hoy

Sé de los primeros/as/es en solicitar esta vacante

Genera un currículum adaptado en cuestión de minutos

Consigue la entrevista y gana más. Más información

Descripción de la vacante

A technology company based in Mexico is seeking an experienced SRE Lead to enhance reliability, scalability, and automation across multi-cloud platforms. The role involves defining key performance indicators, leading CI/CD design, and ensuring high availability of critical systems. The ideal candidate will have strong skills in Datadog, AWS, Azure, and MongoDB, along with expertise in engineering practices to optimize service delivery. This position also emphasizes collaboration with various teams to integrate reliability throughout the software development lifecycle.

Formación

Strong experience with Datadog, PagerDuty, Azure, AWS, and MongoDB.
Proficiency in scripting (Python, Bash) and Infrastructure as Code (Terraform, ARM templates).
Hands-on experience with containerization (Docker, Kubernetes).
Deep understanding of SLIs / SLOs, error budgets, and reliability engineering practices.
Expertise in CI/CD tools (Azure DevOps, Jenkins, GitHub Actions).
Strong automation mindset and experience with configuration management tools (Ansible, Chef, or similar).

Responsabilidades

Design and implement SRE best practices for monitoring, alerting, and incident response.
Define and track SLIs, SLOs, and SLAs to improve system reliability.
Lead CI/CD pipeline design and optimization for multi-cloud environments.
Automate infrastructure provisioning and deployments using Infrastructure as Code (IaC).
Own incident response processes leveraging PagerDuty and Datadog.
Conduct post-mortems and implement preventive measures.
Architect and manage hybrid cloud environments.
Optimize cost, performance, and security across cloud services.
Ensure high availability and performance of MongoDB clusters.
Implement backup, recovery, and disaster recovery strategies.
Mentor SRE / DevOps engineers and foster a culture of reliability and automation.
Collaborate with development, QA, and product teams.

Conocimientos

Datadog

PagerDuty

Azure

AWS

MongoDB

Python

Bash

Terraform

Docker

Kubernetes

Azure DevOps

Jenkins

GitHub Actions

Ansible

Chef

Job Role Summary

Seeking an experienced SRE Lead to drive reliability, scalability, and automation across multi‑cloud and application platforms. Job DescriptionAs a seasoned SRE and DevOps Lead, you will combine leadership, hands‑on engineering, and strategic thinking to ensure high availability and performance of mission‑critical systems.

Key Responsibilities

Design and implement SRE best practices for monitoring, alerting, and incident response.
Define and track SLIs, SLOs, and SLAs to improve system reliability.
Lead CI/CD pipeline design and optimization for multi‑cloud environments (Azure & AWS).
Automate infrastructure provisioning and deployments using Infrastructure as Code (IaC).
Own incident response processes leveraging PagerDuty and Datadog for alerting and observability.
Conduct post‑mortems and implement preventive measures.
Architect and manage hybrid cloud environments (Azure, AWS).
Optimize cost, performance, and security across cloud services.
Ensure high availability and performance of MongoDB clusters.
Implement backup, recovery, and disaster recovery strategies.
Mentor SRE / DevOps engineers and foster a culture of reliability and automation.
Collaborate with development, QA, and product teams to embed reliability into the SDLC.

Required Skills & Qualifications

The ideal candidate should possess the following skills:

Strong experience with Datadog, PagerDuty, Azure, AWS, and MongoDB.
Proficiency in scripting (Python, Bash) and Infrastructure as Code (Terraform, ARM templates).
Hands‑on experience with containerization (Docker, Kubernetes).
Deep understanding of SLIs / SLOs, error budgets, and reliability engineering practices.
Expertise in CI/CD tools (Azure DevOps, Jenkins, GitHub Actions).
Strong automation mindset and experience with configuration management tools (Ansible, Chef, or similar).

Soft Skills

Excellent communication and leadership skills. Ability to work in a fast‑paced, collaborative environment.

Preferred Qualifications

Experience in regulated industries (Healthcare, Finance, etc.).
AWS Solutions Architect, Azure Administrator, or Datadog Certified Professional.

Consigue la evaluación confidencial y gratuita de tu currículum.

o arrastra un archivo en formato PDF, DOC, DOCX, ODT o PAGES de hasta 5 MB.

Ciudades destacadas

Empresas destacadas

Vacantes populares