A technology company based in Mexico is seeking an experienced SRE Lead to enhance reliability, scalability, and automation across multi-cloud platforms. The role involves defining key performance indicators, leading CI/CD design, and ensuring high availability of critical systems. The ideal candidate will have strong skills in Datadog, AWS, Azure, and MongoDB, along with expertise in engineering practices to optimize service delivery. This position also emphasizes collaboration with various teams to integrate reliability throughout the software development lifecycle.
Qualifications
Strong experience with Datadog, PagerDuty, Azure, AWS, and MongoDB.
Proficiency in scripting (Python, Bash) and Infrastructure as Code (Terraform, ARM templates).
Hands-on experience with containerization (Docker, Kubernetes).
Deep understanding of SLIs / SLOs, error budgets, and reliability engineering practices.
Expertise in CI/CD tools (Azure DevOps, Jenkins, GitHub Actions).
Strong automation mindset and experience with configuration management tools (Ansible, Chef, or similar).
Responsabilités
Design and implement SRE best practices for monitoring, alerting, and incident response.
Define and track SLIs, SLOs, and SLAs to improve system reliability.
Lead CI/CD pipeline design and optimization for multi-cloud environments.
Automate infrastructure provisioning and deployments using Infrastructure as Code (IaC).
Own incident response processes leveraging PagerDuty and Datadog.
Conduct post-mortems and implement preventive measures.
Architect and manage hybrid cloud environments.
Optimize cost, performance, and security across cloud services.
Ensure high availability and performance of MongoDB clusters.
Implement backup, recovery, and disaster recovery strategies.
Mentor SRE / DevOps engineers and foster a culture of reliability and automation.
Collaborate with development, QA, and product teams.
Connaissances
Datadog
PagerDuty
Azure
AWS
MongoDB
Python
Bash
Terraform
Docker
Kubernetes
Azure DevOps
Jenkins
GitHub Actions
Ansible
Chef
Description du poste
A technology company based in Mexico is seeking an experienced SRE Lead to enhance reliability, scalability, and automation across multi-cloud platforms. The role involves defining key performance indicators, leading CI/CD design, and ensuring high availability of critical systems. The ideal candidate will have strong skills in Datadog, AWS, Azure, and MongoDB, along with expertise in engineering practices to optimize service delivery. This position also emphasizes collaboration with various teams to integrate reliability throughout the software development lifecycle.
* Le salaire de référence se base sur les salaires cibles des leaders du marché dans leurs secteurs correspondants. Il vise à servir de guide pour aider les membres Premium à évaluer les postes vacants et contribuer aux négociations salariales. Le salaire de référence n’est pas fourni directement par l’entreprise et peut pourrait être beaucoup plus élevé ou plus bas.