¡Activa las notificaciones laborales por email!

Senior Site Reliability Engineer

Royal Caribbean International

Ciudad de México

Presencial

MXN 1,470,000 - 2,206,000

Jornada completa

Hoy
Sé de los primeros/as/es en solicitar esta vacante

Descripción de la vacante

A global cruise line is seeking a Senior Site Reliability Engineer in Mexico City. The role involves managing incident responses, overseeing application performance, and collaborating with IT teams. Qualified candidates should have 6-10 years in SRE or related IT roles and a Bachelor's degree in Computer Science. This position requires significant technical expertise in cloud platforms and excellent communication skills.

Servicios

Competitive compensation
Career development opportunities

Formación

  • 6-10 years in Site Reliability Engineering, DevOps, QA, or related IT operations role.
  • Proficiency in cloud platforms such as AWS and understanding of API design principles.
  • Experience with monitoring and logging tools.

Responsabilidades

  • Responsible for Incident Management and Application Performance.
  • Lead a team to react quickly to production incidents.
  • Proactively monitor and manage application performance.

Conocimientos

Site Reliability Engineering
DevOps
QA
Strong analytical skills
Excellent communication

Educación

Bachelor’s degree in Computer Science

Herramientas

AWS
AppDynamics
DataDog
Splunk
New Relic
Descripción del empleo
Overview

Journey with us! Combine your career goals and sense of adventure by joining our incredible team of employees at Royal Caribbean Group. We are proud to offer a competitive compensation and benefits package, and excellent career development opportunities, each offering unique ways to explore the world. Royal Caribbean Group’s Global eCommerce has an exciting career opportunity for a full time Senior Site Reliability Engineer reporting to the Sr. Manager, Site Reliability Engineer.

This position will work on-site in Mexico City.

Position

Senior Site Reliability Engineer role to support the Royal Caribbean website by utilizing application and user performance data to guide informed decision-making.

Responsibilities
  • Product Health: Responsible for the Incident Management, Application Performance, Configuration Management and Operational Readiness of the products within ownership. Partner with stakeholders from IT teams to ensure performance, configuration, and monitoring tools meet product needs.
  • Incident Management: Lead a team prepared to react quickly to production incidents to restore systems and applications to normal service operation, minimizing impact on guest/crew experience and business operations. Review ticket analysis, approve closures, understand website architecture, and escalate as needed. Communicate incident details to production teams and stakeholders, document incidents, perform postmortems, and follow up on actions.
  • Application Performance Management (APM): Proactively monitor and manage performance and availability of applications. Detect and diagnose complex performance problems, provide insights into metrics, and create reports on deployment build performance.
  • Configuration Management: Lead the implementation and maintenance of technology standards across product definition and configuration. Adjust health thresholds and monitoring settings; create and maintain performance dashboards; maintain alerting, communication, and documentation tools.
  • Change Control Governance: Ensure production changes are planned, authorized, tested, and validated from a monitoring perspective.
  • Production Operations Readiness: Ensure all product implementations go through operational readiness reviews and maintain clear communication channels with Scrum and marketing teams.
Qualifications, Knowledge and Skills
  • 6-10 years in Site Reliability Engineering (SRE), DevOps, QA, or related IT operations role.
  • Bachelor’s degree in Computer Science, Information Technology, Computer Engineering, or other relevant advanced degree preferred.
  • Technical Expertise: Proficiency in cloud platforms such as AWS, AWS Elastic Beanstalk; understanding of API design principles: REST, SOAP, Graph; advanced knowledge of monitoring and logging tools (AppDynamics, DataDog, Splunk, New Relic, etc.); experience with Adobe AEM Cloud is preferred.
  • Problem-Solving Skills: Strong analytical and troubleshooting skills to diagnose and resolve complex production issues swiftly; ability to develop and implement effective incident response plans.
  • Communication and Collaboration: Excellent written and verbal communication; ability to collaborate with Development, QA, IT, and external managed service providers.
  • Work Environment: On-call rotation may be required to handle urgent incidents and ensure 24x7 system reliability; on-call duties may include evenings, weekends, and holidays as needed.

We know there’s a lot to consider. As you go through the application process, our recruiters will provide guidance and answer questions. Thank you for your interest in Royal Caribbean Group. We hope to see you onboard soon!

It is the policy of the Company to ensure equal employment and promotion opportunity to qualified candidates without discrimination or harassment on the basis of race, color, religion, sex, age, national origin, disability, sexual orientation, sexuality, gender identity or expression, marital status, or any other characteristic protected by law. Royal Caribbean Group and each of its subsidiaries prohibit and will not tolerate discrimination or harassment.

Consigue la evaluación confidencial y gratuita de tu currículum.
o arrastra un archivo en formato PDF, DOC, DOCX, ODT o PAGES de hasta 5 MB.