¡Activa las notificaciones laborales por email!

Site Reliability Engineer

-

Monterrey

Híbrido

MXN 1,080,000 - 1,621,000

Jornada completa

Ayer

Sé de los primeros/as/es en solicitar esta vacante

Genera un currículum adaptado en cuestión de minutos

Consigue la entrevista y gana más. Más información

Descripción de la vacante

A leading technology firm is seeking a Site Reliability Engineer (SRE) in Monterrey. The SRE will focus on Application Performance Monitoring (APM) and system optimization, ensuring the reliability and performance of critical applications. This role involves designing monitoring strategies, analyzing performance, automating tasks, and collaborating with engineering teams. Candidates should have 5+ years of relevant experience, a degree in a related field, and be fluent in English. Competitive compensation and hybrid work model offered.

Servicios

Competitive compensation

Hybrid work model

Formación

5+ years of experience in Site Reliability, DevOps, or Performance Engineering roles.
Proven hands‑on experience with APM tools such as Elastic APM, Datadog, Dynatrace, etc.
Expertise in the Elastic Stack (Elasticsearch, Logstash, Kibana, Beats) for logging and monitoring.

Responsabilidades

Design and manage APM strategies using various tools.
Perform deep performance analysis and identify bottlenecks.
Build real-time dashboards and alerting systems.

Conocimientos

Application Performance Monitoring (APM)

Performance Analysis

System Optimization

Scripting (Python, Bash, PowerShell)

Collaboration

Problem-solving

Educación

Bachelor's or Master's degree in Computer Science, Engineering, or related field

Herramientas

Elastic APM

Datadog

Grafana

Docker

Kubernetes

JOB DESCRIPTION

Site Reliability Engineer (SRE) - Application Performance Monitoring (APM)

Location: Monterrey, Nuevo León, Mexico (Hybrid - candidates must reside in Monterrey or the metropolitan area)

Language requirement: Fluent English (spoken and written)

About the Role

We're looking for a Site Reliability Engineer (SRE) with a passion for Application Performance Monitoring (APM) and system optimization.

In this role, you'll be at the heart of ensuring the reliability, scalability, and performance of NOV's mission‑critical applications. You'll work closely with software engineering and operations teams to design monitoring strategies, analyze performance, and proactively prevent issues before they affect users.

If you thrive in fast‑paced environments, love solving complex technical challenges, and enjoy turning data into insight, this is the role for you.

What You'll Do

Design and manage APM strategies using tools like Elastic APM, Datadog, Dynatrace, or similar platforms.
Perform deep performance analysis, tracing distributed requests and identifying bottlenecks in both code and infrastructure.
Build real‑time dashboards and alerting systems using Grafana, Kibana, or equivalent tools to visualize system health.
Proactively monitor systems to detect performance degradations, security threats, and system failures - before users are impacted.
Define and track Service Level Objectives (SLOs) and Service Level Agreements (SLAs) to continuously improve reliability.
Lead Root Cause Analysis (RCA) sessions after incidents and implement corrective actions to prevent recurrence.
Automate repetitive tasks and monitoring setups using Python, Bash, or PowerShell.
Collaborate with cross‑functional teams to embed reliability, performance, and observability best practices into every stage of development.
Continuously refine tools, processes, and APM strategies to enhance efficiency, reliability, and visibility across platforms.
Engage with stakeholders to understand performance challenges and shape the platform roadmap.

What You Bring

Bachelor's or Master's degree in Computer Science, Engineering, or related field.
5+ years of experience in Site Reliability, DevOps, or Performance Engineering roles.
Proven hands‑on experience with APM tools such as Elastic APM, Datadog, Dynatrace, New Relic, or AppDynamics.
Expertise in the Elastic Stack (Elasticsearch, Logstash, Kibana, Beats) for logging, monitoring, and APM.
Deep understanding of SRE principles, DevOps methodologies, and Production Support operations.
Strong scripting ability in Python, Bash, or PowerShell for automation and analysis.
Solid grasp of Linux/Unix systems, networking fundamentals, and distributed system architecture.
Experience with containerization (Docker) and orchestration (Kubernetes).
Excellent analytical, problem‑solving, and collaboration skills, with the ability to communicate effectively in a global team.

Preferred Skills

Fluent English (Mandatory)
Experience with Infrastructure as Code (IaC) tools such as Terraform, Ansible, or Chef.
Familiarity with cloud‑native services (AWS, Azure, or GCP) and serverless architectures (AWS Lambda, Azure Functions).
Knowledge of CI/CD tools like GitHub Actions, Azure DevOps, or Jenkins.
Understanding of other observability pillars, including metrics (Prometheus) and logging.
Experience working in agile environments.

Why NOV

At NOV, we combine over 150 years of innovation with cutting‑edge technology to power the global energy industry.

You'll join a global engineering team that values collaboration, curiosity, and continuous improvement - giving you the opportunity to make a real impact on systems that matter.

Consigue la evaluación confidencial y gratuita de tu currículum.

o arrastra un archivo en formato PDF, DOC, DOCX, ODT o PAGES de hasta 5 MB.

Ciudades destacadas

Empresas destacadas

Vacantes populares