¡Activa las notificaciones laborales por email!

Senior Director - Site Reliability Engineering

buscojobs España

Barcelona

Presencial

EUR 90.000 - 160.000

Jornada completa

Ayer
Sé de los primeros/as/es en solicitar esta vacante

Genera un currículum adaptado en cuestión de minutos

Consigue la entrevista y gana más. Más información

Empieza desde cero o carga un currículum

Descripción de la vacante

A leading company is seeking a Senior Director of Site Reliability Engineering in Barcelona to oversee SRE operations and promote reliability across digital products. The ideal candidate will have extensive experience in leading SRE teams, implementing effective strategies, and fostering a culture of continuous improvement in technology solutions.

Formación

  • 12+ years of applied experience in site reliability or software engineering.
  • 5+ years leading technologists managing complex technical issues.
  • Strong expertise in designing and implementing reliability strategies.

Responsabilidades

  • Lead three regional SRE teams and enhance team effectiveness.
  • Develop SLOs and SLAs aligned with business goals.
  • Implement best practices for observability and incident management.

Conocimientos

.NET
Angular
SQL
NoSQL
AWS
Azure
Observability
Performance Engineering

Educación

Formal training or certification in site reliability or software engineering

Herramientas

Prometheus
Grafana
Dynatrace

Descripción del empleo

Senior Director - Site Reliability Engineering

Barcelona

EUR 90.000 - 160.000

Join a world-class team of skilled engineers who build creative digital solutions to support our colleagues and clients. We make a broad organizational impact by delivering cutting-edge technology solutions that power Gartner. Gartner IT values its culture of nonstop innovation, an outcome-driven approach to success, and the notion that great ideas can come from anyone on the team.

About the role :

Are you a versatile reliability expert? Do you excel as an architect passionate about building resilient, performant, and scalable systems? If you're a technical leader thriving at the intersection of software platform and on-premise infrastructure, we want to meet you!

We seek a seasoned Sr. Director of Site Reliability Engineering (SRE) responsible for developing and implementing a comprehensive strategy for site reliability, including scalability, performance, and reliability improvements. This role will align SRE objectives with Gartner's conference business goals and technology roadmaps, fostering continuous improvement within the SRE team to meet Global Conferences MCPs.

The individual will oversee SRE team operations, ensuring the reliability and availability of all technical services delivering a world-class experience to clients. This role involves working with Service Management to enforce best practices for system reliability, monitoring, capacity planning, incident response, problem management, disaster recovery, change management, and workflow automation. Collaboration with the Global SRE Practice Leader to implement standard tools and technologies for SRE metrics and improvement areas is also essential.

What you'll do :

  • Lead three regional, cross-functional SRE teams, providing leadership, mentorship, and talent strategy to enhance team effectiveness and performance.
  • Design and implement a holistic reliability strategy aligned with Gartner's Conference Business Objectives and Technology Roadmap, focusing on resiliency engineering, performance optimization, and platform stability.
  • Partner with executive leadership to communicate SRE initiatives, advocate for ROI, and drive organizational change.
  • Develop SLOs and SLAs with Business and Technology leaders aligned with business goals.
  • Promote a culture of automation, tooling, and continuous improvement to reduce toil.
  • Architect and build scalable, secure, and resilient digital products using .NET, Node.js, Java, on-premise, and cloud environments like AWS and Azure, including multiregion DR strategies.
  • Implement best practices for observability, leveraging tools and techniques to gain deep insights into system health and performance.
  • Establish 24x7 Incident Management processes in partnership with Gartner's NOC; oversee incident response, root cause analysis, and postmortem procedures.
  • Track and improve metrics like MTTI, MTTR, and MTTF for all technology services.
  • Stay updated on the latest trends and technologies in reliability, performance engineering, and cloud computing.
  • Manage reliability budgets and resources effectively.
  • Advocate for reliability best practices organization-wide.
  • Travel to premium destination conferences for onsite support.

What you'll need :

  • Formal training or certification in site reliability or software engineering, with 12+ years of applied experience; 5+ years leading technologists managing complex technical issues.
  • Minimum 7 years supporting site reliability engineering, design, scaling, resilience, and performance assessments for critical applications.
  • Strong expertise in .NET, Angular, SQL, NoSQL technologies.
  • Extensive experience with AWS and Azure cloud platforms, including services like EKS, Serverless, CDN, GraphQL.
  • Experience leading SRE teams for SaaS applications and managing high-severity incidents.
  • Deep knowledge of observability tools (Prometheus, Grafana, Dynatrace, etc.) and resiliency principles.
  • Proven ability to design and implement performance strategies.
  • Excellent communication, collaboration, and leadership skills.
  • A strategic mindset focused on proactive reliability approaches.
Consigue la evaluación confidencial y gratuita de tu currículum.
o arrastra un archivo en formato PDF, DOC, DOCX, ODT o PAGES de hasta 5 MB.