¡Activa las notificaciones laborales por email!

Site Reliability Developer 4

Oracle

Zapopan

Presencial

MXN 800,000 - 1,200,000

Jornada completa

Hace 30+ días

Descripción de la vacante

A leading technology company in Zapopan seeks an experienced Site Reliability Engineer to design and automate critical systems. This role directly impacts the reliability of cloud services used globally. Candidates should have strong skills in Linux and Python, and experience with CI/CD pipelines. The position offers opportunities for improving operational efficiency and working within Agile environments.

Formación

  • Advanced Linux systems administration skills are essential.
  • Strong coding skills in Python with a focus on automation are required.
  • Experience with CI/CD pipelines and deployment automation is necessary.

Responsabilidades

  • Collaborate with SRE and development teams for reliability across services.
  • Design and implement software tools for enhanced monitoring and automation.
  • Own metrics and dashboards that track system health and performance.
  • Provide on-call support as part of a rotational schedule.

Conocimientos

Advanced Linux systems administration
Strong coding skills in Python
Intermediate experience with Bash/Shell scripting
Familiarity with networking principles
Basic to intermediate knowledge of databases
Understanding of unit testing
Experience with CI/CD pipelines
Comfortable working in Agile environments

Herramientas

Prometheus
Grafana
New Relic
Descripción del empleo

As part of the Site Reliability Engineering (SRE) team, you’ll contribute to designing, automating, and evolving mission-critical systems. You'll combine deep systems expertise with modern software engineering practices to reduce operational toil and build resilient, self-healing services.

This is a high-impact role where your work directly affects the reliability of cloud services used by thousands of customers around the world.

Qualifications

Career Level - IC4

Responsibilities

What You’ll Do:

  • Collaborate with SRE and development teams to ensure end-to-end reliability across a wide range of services and technology stacks.
  • Design, write, and deploy software and automation tools that enhance availability, observability, and scalability.
  • Own and evolve metrics, SLOs, SLAs, KPIs, and dashboards that track system health and customer experience.
  • Build tooling to reduce manual operations and eliminate sources of toil.
  • Improve CI/CD pipelines, deployment processes, and validation frameworks for reliability and efficiency.
  • Review and influence architectural designs for distributed systems with a focus on resilience, performance, and fault tolerance.
  • Lead and participate in post-incident reviews, capacity planning, and production-readiness assessments.
  • Provide on-call support on a rotational basis (12-hour shifts, 7-day coverage).

What We’re Looking For:

  • Advanced Linux systems administration
  • Strong coding skills in Python (automation-focused)
  • Intermediate experience with Bash/Shell scripting
  • Familiarity with networking principles and distributed systems behavior
  • Basic to intermediate knowledge of databases (e.g., SQL, NoSQL)
  • Understanding of unit testing and modern software engineering practices
  • Experience with CI/CD pipelines and deployment automation
  • Comfortable working in Agile development environments

Nice to Have:

  • Exposure to monitoring/observability tools (e.g., Prometheus, Grafana, New Relic)
  • Experience building internal tools for operational efficiency
  • Participation in SRE culture: blameless postmortems, runbooks, and service design reviews
Consigue la evaluación confidencial y gratuita de tu currículum.
o arrastra un archivo en formato PDF, DOC, DOCX, ODT o PAGES de hasta 5 MB.