¡Activa las notificaciones laborales por email!

Senior Director - Site Reliability Engineering

Gartner

Barcelona

Presencial

EUR 90.000 - 160.000

Jornada completa

Hace 30+ días

Mejora tus posibilidades de llegar a la entrevista

Elabora un currículum adaptado a la vacante para tener más posibilidades de triunfar.

Descripción de la vacante

An innovative company is seeking a Sr. Director of Site Reliability Engineering to lead a talented team in building resilient and scalable systems. This role involves developing strategies for site reliability, performance, and incident management while collaborating with executive leadership to drive organizational change. You will oversee the operations of SRE teams, promoting a culture of automation and continuous improvement. Ideal candidates will have extensive experience with cloud platforms like AWS and Azure, along with a strong background in .NET and performance optimization. This is an exciting opportunity to make a significant impact in a forward-thinking environment.

Formación

  • 12+ years of experience in site reliability engineering and software engineering.
  • 5+ years of leadership experience managing complex technical issues.

Responsabilidades

  • Lead cross-functional SRE teams to enhance effectiveness and performance.
  • Design a reliability strategy aligned with business objectives.

Conocimientos

.NET
AWS
Azure
Site Reliability Engineering
Leadership
Performance Optimization
Incident Management
Observability Tools
Collaboration
Communication

Educación

Formal training in site reliability or software engineering

Herramientas

Prometheus
Grafana
Dynatrace
EKS
Serverless
CDN
GraphQL

Descripción del empleo

Join a world-class team of skilled engineers who build creative digital solutions to support our colleagues and clients. We make a broad organizational impact by delivering cutting-edge technology solutions that power Gartner. Gartner IT values its culture of nonstop innovation, an outcome-driven approach to success, and the notion that great ideas can come from anyone on the team.

About the role :

Are you a versatile reliability expert? Do you excel as an architect passionate about building resilient, performant, and scalable systems? If you're a technical leader thriving at the intersection of software platform and on-premise infrastructure, we want to meet you!

We seek a seasoned Sr. Director of Site Reliability Engineering (SRE) responsible for developing and implementing a comprehensive strategy for site reliability, including scalability, performance, and reliability improvements. This role will align SRE objectives with Gartner's conference business goals and technology roadmaps, fostering continuous improvement within the SRE team to meet Global Conferences MCPs.

The individual will oversee SRE team operations, ensuring the reliability and availability of all technical services delivering a world-class experience to clients. This role involves working with Service Management to enforce best practices for system reliability, monitoring, capacity planning, incident response, problem management, disaster recovery, change management, and workflow automation. Collaboration with the Global SRE Practice Leader to implement standard tools and technologies for SRE metrics and improvement areas is also essential.

What you'll do :

  • Lead three regional, cross-functional SRE teams, providing leadership, mentorship, and talent strategy to enhance team effectiveness and performance.
  • Design and implement a holistic reliability strategy aligned with Gartner's Conference Business Objectives and Technology Roadmap, focusing on resiliency engineering, performance optimization, and platform stability.
  • Partner with executive leadership to communicate SRE initiatives, advocate for ROI, and drive organizational change.
  • Develop SLOs and SLAs with Business and Technology leaders aligned with business goals.
  • Promote a culture of automation, tooling, and continuous improvement to reduce toil.
  • Architect and build scalable, secure, and resilient digital products using .NET, Node.js, Java, on-premise, and cloud environments like AWS and Azure, including multiregion DR strategies.
  • Implement best practices for observability, leveraging tools and techniques to gain deep insights into system health and performance.
  • Establish 24x7 Incident Management processes in partnership with Gartner's NOC; oversee incident response, root cause analysis, and postmortem procedures.
  • Track and improve metrics like MTTI, MTTR, and MTTF for all technology services.
  • Stay updated on the latest trends and technologies in reliability, performance engineering, and cloud computing.
  • Manage reliability budgets and resources effectively.
  • Advocate for reliability best practices organization-wide.
  • Travel to premium destination conferences for onsite support.

What you'll need :

  • Formal training or certification in site reliability or software engineering, with 12+ years of applied experience; 5+ years leading technologists managing complex technical issues.
  • Minimum 7 years supporting site reliability engineering, design, scaling, resilience, and performance assessments for critical applications.
  • Strong expertise in .NET, Angular, SQL, NoSQL technologies.
  • Extensive experience with AWS and Azure cloud platforms, including services like EKS, Serverless, CDN, GraphQL.
  • Experience leading SRE teams for SaaS applications and managing high-severity incidents.
  • Deep knowledge of observability tools (Prometheus, Grafana, Dynatrace, etc.) and resiliency principles.
  • Proven ability to design and implement performance strategies.
  • Excellent communication, collaboration, and leadership skills.
  • A strategic mindset focused on proactive reliability approaches.
Consigue la evaluación confidencial y gratuita de tu currículum.
o arrastra un archivo en formato PDF, DOC, DOCX, ODT o PAGES de hasta 5 MB.