¡Activa las notificaciones laborales por email!

Site Reliability Engineer

Everscale Group

Ciudad de México

Presencial

USD 60,000 - 100,000

Jornada completa

Hace 30+ días

Genera un currículum adaptado en cuestión de minutos

Consigue la entrevista y gana más. Más información

Empieza desde cero o carga un currículum

Descripción de la vacante

Join a forward-thinking company as a Site Reliability Engineer, where you'll play a crucial role in the product life-cycle. This position involves designing and architecting distributed systems, enhancing visibility through monitoring solutions, and ensuring the reliability of both on-prem and cloud resources. You'll leverage your expertise in cloud computing and automation technologies to optimize performance and reduce incident resolution times. If you're passionate about building robust systems and thrive in a dynamic environment, this opportunity is perfect for you.

Formación

  • Experience in monitoring infrastructure and application availability.
  • Strong knowledge of cloud computing and containerization technologies.

Responsabilidades

  • Design and architect distributed systems in the cloud.
  • Create monitoring and alerting solutions for application performance.
  • Develop and troubleshoot large-scale production systems.

Conocimientos

Monitoring infrastructure
Cloud Computing (AWS preferred)
Containerization (Docker, Kubernetes)
Systems Administration
Automation (Chef, Puppet, Terraform)
Programming (Python, Golang, Java)
Network protocols understanding

Herramientas

ElasticSearch
Prometheus
Graphite
Kafka
VMWare
Jenkins

Descripción del empleo

As a Site Reliability Engineer, your role covers the entire life-cycle of a product – from helping developers with architecture and delivery to on-call incident response and triage. You will be responsible for on-prem and cloud resources and should have a good understanding of cloud infrastructure fundamentals.

Responsibilities:
  • You will design and architect distributed systems in the cloud and understand how to move systems from on-prem data centers to the cloud
  • You will create monitoring, alerting, and dashboarding solutions that improve visibility into EA’s application performance and business metrics.
  • You will develop and troubleshoot distributed, large-scale production systems spanning on-prem. and cloud-based hosting
  • You will perform root cause analysis and post-mortems with an eye toward future prevention.
  • You will use automation technologies to ensure repeatability, eliminate toil, and reduce mean time to detection and resolution (MTTD & MTTR), and repair services.
  • You will design CI/CD pipelines.
  • You will produce documentation and support tooling for online support teams.
Qualifications:
  • Experience monitoring infrastructure and application availability to ensure SLI and SLO.
  • Experience with Virtualization, Containerization, Cloud Computing (AWS preferred), VMWare ecosystems, Kubernetes, or Docker.
  • Knowledge of ElasticSearch, Prometheus, Graphite, Kafka
  • Systems Administration experience, including an understanding of *nix.
  • Network experience, including an understanding of standard protocols/components.
  • Automation and orchestration experience including Chef, Puppet, Terraform, Packer, or Jenkins.
  • Experience writing code in Python, Golang, and/or Java.
  • Experience working with distributed systems.
Consigue la evaluación confidencial y gratuita de tu currículum.
o arrastra un archivo en formato PDF, DOC, DOCX, ODT o PAGES de hasta 5 MB.