¡Activa las notificaciones laborales por email!

Digital S/W Eng Intm Analyst - Control&Hygiene

Buscojobs México

Veracruz

Presencial

MXN 200,000 - 400,000

Jornada completa

Hoy

Sé de los primeros/as/es en solicitar esta vacante

Descripción de la vacante

A leading recruitment service in Mexico is seeking a Senior Site Reliability Engineer (SRE) who will lead initiatives within IT operations. This full-time role requires expertise in cloud technologies, automation, and SRE tools, with a focus on engineering solutions and operational excellence. Ideal candidates should have around 8-10 years of hands-on experience and capabilities in designing and implementing SRE solutions.

Formación

Around 8-10 years of SRE hands on experience.
Experience with cloud technologies, development, SRE toolsets and automation.
Good understanding of Agile/Waterfall/Scrum/Kanban methodologies.

Responsabilidades

Design, develop, and implement cutting-edge SRE solutions.
Drive automation for repetitive operational tasks.
Implement monitoring systems for applications and infrastructure.

Conocimientos

SRE hands on experience

Cloud technologies

Automation

AI Workflows

Docker/Kubernetes

Linux Commands

GitLab CICD Setup

Herramientas

Ansible

Splunk

Prometheus

Grafana

Datadog

AppDynamics

Dynatrace

Senior Site Reliability Engineer (SRE) with advanced English skills (B2/C1) for a full-time position.

Location: Mexico

We are currently seeking a highly skilled SRE Sr Engineer with solid experience to help lead transformational initiatives within IT operations and encompassing development. As a crucial figure in this role, you will participate/help designing, developing, and implementing cutting-edge SRE solutions, driving the transformation of IT operations organizations to adopt an engineering-centric approach.

Key Responsibilities

Should be very well equipped with all SRE parameters and key metrics and transformation steps.
Drive automation for repetitive operational tasks (toil reduction) through scripts, playbooks, and self-healing workflows.
Design and implement automated runbooks, pipelines, and reliability blueprints to accelerate incident mitigation and enhance system resiliency.
Knowledge of traditional support to SRE transformation is a great advantage.
Worked in large scaled production with ITIL & SRE processes, good understanding on ticket management.
Strong understanding on Agile/Waterfall/Scrum/Kanban and leading SRE deliverables.
Collaborate with development teams on resiliency to ensure that services and applications are designed with operational reliability in mind.
Implement monitoring systems to assess the performance of applications and infrastructure and proactively identifying areas for optimization.
Understanding incident and problem management process, post-mortems, and driving improvements to prevent future incidents.
Ability to translate technical language from Spanish to English, mainly within Monitoring Dashboards and Alerting.

Required Skills & Experience

Around 8-10 years of SRE hands on experience with cloud technologies, development, SRE toolsets and automation.
Should have automation (data refresh, releases, DB snapshots) experience using Ansible or any other scripting languages.
Solid Experience building AI Workflows / Operations Orchestration for Toil reduction and Issue resolution with Self-Healing.
Hands-on experience in AIOPS Tools and Technologies for building AI Agents and Agentic flows.
Participate in architecture of reliable, scalable, and high-performance systems and services with a focus on operational excellence, availability, and performance.
Hands on experience in building Observability as a service, Telemetry data collection using Open Telemetry, APM, SolarWinds, Open-Source tools (Prometheus and Grafana), Log Aggregations (Kibana or Splunk).
Observability Single Pane Dashboarding.
Strong hands-on experience with any Cloud Technology (AWS): Control Tower, Project Setup, Creating Accounts, RDS, SSO.
Solid understanding and hands on experience with Docker/Kubernetes.
Should have good experience with Linux Commands, GitLab CICD Setup and Terraform (state management, etc).
Monitoring & alerting setup experience with Splunk, Prometheus, Grafana, Kibana, ELK etc.
Hands on APM Tool/s experience, preferably Datadog or AppDynamics or Dynatrace.
Good understanding of Observability Framework leveraging programmatic SLI/SLO blueprints to standardize the collection of golden signals.
Experience with following languages (Groovy-DSL, Java, Python, Yaml and microservices architecture).
Good understanding and hands on experience with MQ, Kafka.
Experience with Databases (Oracle, MySQL)

Nice to Have

· Any of the relevant professional certifications – Certified Site Reliability Engineer (CSRE), Certified Kubernetes Administrator (CKA), AWS Certified DevOps Engineer Professional, Google Cloud Professional; DevOps Engineer, Developer background highly desired.

Seniority level

Director

Employment type

Full-time

Job function

Engineering, Business Development, and Consulting
IT Services and IT Consulting

We are not including informational banners or job feed items from the original posting. This version keeps the core responsibilities and qualifications and presents them in a clean, structured format for clarity.

Consigue la evaluación confidencial y gratuita de tu currículum.

o arrastra un archivo en formato PDF, DOC, DOCX, ODT o PAGES de hasta 5 MB.

Ciudades destacadas

Empresas destacadas

Vacantes populares

Digital S/W Eng Intm Analyst - Control&Hygiene

Buscojobs México

Veracruz

Presencial

MXN 200,000 - 400,000