¡Activa las notificaciones laborales por email!

Digital S/W Eng Intm Analyst - Control&Hygiene

Buscojobs México

Veracruz

Presencial

MXN 200,000 - 400,000

Jornada completa

Hoy
Sé de los primeros/as/es en solicitar esta vacante

Descripción de la vacante

A leading recruitment service in Mexico is seeking a Senior Site Reliability Engineer (SRE) who will lead initiatives within IT operations. This full-time role requires expertise in cloud technologies, automation, and SRE tools, with a focus on engineering solutions and operational excellence. Ideal candidates should have around 8-10 years of hands-on experience and capabilities in designing and implementing SRE solutions.

Formación

  • Around 8-10 years of SRE hands on experience.
  • Experience with cloud technologies, development, SRE toolsets and automation.
  • Good understanding of Agile/Waterfall/Scrum/Kanban methodologies.

Responsabilidades

  • Design, develop, and implement cutting-edge SRE solutions.
  • Drive automation for repetitive operational tasks.
  • Implement monitoring systems for applications and infrastructure.

Conocimientos

SRE hands on experience
Cloud technologies
Automation
AI Workflows
Docker/Kubernetes
Linux Commands
GitLab CICD Setup

Herramientas

Ansible
Splunk
Prometheus
Grafana
Datadog
AppDynamics
Dynatrace
Descripción del empleo

Senior Site Reliability Engineer (SRE) with advanced English skills (B2/C1) for a full-time position.

Location: Mexico

We are currently seeking a highly skilled SRE Sr Engineer with solid experience to help lead transformational initiatives within IT operations and encompassing development. As a crucial figure in this role, you will participate/help designing, developing, and implementing cutting-edge SRE solutions, driving the transformation of IT operations organizations to adopt an engineering-centric approach.

Key Responsibilities
  • Should be very well equipped with all SRE parameters and key metrics and transformation steps.
  • Drive automation for repetitive operational tasks (toil reduction) through scripts, playbooks, and self-healing workflows.
  • Design and implement automated runbooks, pipelines, and reliability blueprints to accelerate incident mitigation and enhance system resiliency.
  • Knowledge of traditional support to SRE transformation is a great advantage.
  • Worked in large scaled production with ITIL & SRE processes, good understanding on ticket management.
  • Strong understanding on Agile/Waterfall/Scrum/Kanban and leading SRE deliverables.
  • Collaborate with development teams on resiliency to ensure that services and applications are designed with operational reliability in mind.
  • Implement monitoring systems to assess the performance of applications and infrastructure and proactively identifying areas for optimization.
  • Understanding incident and problem management process, post-mortems, and driving improvements to prevent future incidents.
  • Ability to translate technical language from Spanish to English, mainly within Monitoring Dashboards and Alerting.
Required Skills & Experience
  • Around 8-10 years of SRE hands on experience with cloud technologies, development, SRE toolsets and automation.
  • Should have automation (data refresh, releases, DB snapshots) experience using Ansible or any other scripting languages.
  • Solid Experience building AI Workflows / Operations Orchestration for Toil reduction and Issue resolution with Self-Healing.
  • Hands-on experience in AIOPS Tools and Technologies for building AI Agents and Agentic flows.
  • Participate in architecture of reliable, scalable, and high-performance systems and services with a focus on operational excellence, availability, and performance.
  • Hands on experience in building Observability as a service, Telemetry data collection using Open Telemetry, APM, SolarWinds, Open-Source tools (Prometheus and Grafana), Log Aggregations (Kibana or Splunk).
  • Observability Single Pane Dashboarding.
  • Strong hands-on experience with any Cloud Technology (AWS): Control Tower, Project Setup, Creating Accounts, RDS, SSO.
  • Solid understanding and hands on experience with Docker/Kubernetes.
  • Should have good experience with Linux Commands, GitLab CICD Setup and Terraform (state management, etc).
  • Monitoring & alerting setup experience with Splunk, Prometheus, Grafana, Kibana, ELK etc.
  • Hands on APM Tool/s experience, preferably Datadog or AppDynamics or Dynatrace.
  • Good understanding of Observability Framework leveraging programmatic SLI/SLO blueprints to standardize the collection of golden signals.
  • Experience with following languages (Groovy-DSL, Java, Python, Yaml and microservices architecture).
  • Good understanding and hands on experience with MQ, Kafka.
  • Experience with Databases (Oracle, MySQL)
Nice to Have

· Any of the relevant professional certifications – Certified Site Reliability Engineer (CSRE), Certified Kubernetes Administrator (CKA), AWS Certified DevOps Engineer Professional, Google Cloud Professional; DevOps Engineer, Developer background highly desired.

Seniority level
  • Director
Employment type
  • Full-time
Job function
  • Engineering, Business Development, and Consulting
  • IT Services and IT Consulting

We are not including informational banners or job feed items from the original posting. This version keeps the core responsibilities and qualifications and presents them in a clean, structured format for clarity.

Consigue la evaluación confidencial y gratuita de tu currículum.
o arrastra un archivo en formato PDF, DOC, DOCX, ODT o PAGES de hasta 5 MB.