¡Activa las notificaciones laborales por email!

Remote Site Reliability Engineer — Cloud Scale & Automation

CrowdStrike

Madrid

Híbrido

EUR 70.000 - 120.000

Jornada completa

Hoy

Sé de los primeros/as/es en solicitar esta vacante

Genera un currículum adaptado en cuestión de minutos

Consigue la entrevista y gana más. Más información

Descripción de la vacante

A leading cybersecurity firm is searching for a Site Reliability Engineer in Madrid to ensure the reliability and scalability of its NG-SIEM platform. The ideal candidate will manage the platform's performance and build automation tools while collaborating cross-functionally. Candidates should possess strong programming skills in Go, extensive cloud expertise, and a solid understanding of distributed systems. Experience with monitoring tools is essential, along with good communication and incident management skills. This role offers a remote-friendly culture and competitive compensation.

Servicios

Remote-friendly work culture

Market leader in compensation and equity awards

Comprehensive wellness programs

Competitive vacation and holidays

Paid parental and adoption leaves

Professional development opportunities

Employee Networks and volunteer opportunities

Vibrant office culture

Formación

Experience in Site Reliability Engineering, DevOps, or similar roles.
Strong programming skills in at least one language (Go).
Deep cloud expertise with hands-on experience in AWS or GCP.
Knowledge of distributed systems design patterns and fault tolerance.
Proficiency with IaC tools and configuration management.
Experience with Kubernetes, Docker, and container-based deployment.
Hands-on experience with monitoring and observability tools.
Experience building and maintaining CI/CD pipelines.
Proven track record of managing high-severity incidents.
Ability to analyze system metrics and logs.
Excellent verbal and written communication skills.

Responsabilidades

Own the availability and performance of NG-SIEM platform services.
Design and implement automation solutions for operational tasks.
Develop comprehensive observability solutions using various metrics.
Lead incident response efforts and conduct post-mortems.
Analyze system performance data and forecast infrastructure needs.
Define and maintain Service Level Objectives.
Implement strategies to optimize cloud resource utilization.
Collaborate with engineering teams to improve system design.
Participate in on-call rotation for critical production systems.
Create and maintain operational procedures and documentation.

Conocimientos

Site Reliability Engineering

DevOps

Programming (Go)

Cloud expertise (AWS/GCP)

Distributed systems knowledge

Infrastructure as Code (Terraform)

Container orchestration (Kubernetes)

Observability tools (Prometheus, Grafana)

CI/CD pipelines

Incident management

Data-driven approach

Communication skills

Herramientas

Prometheus

Grafana

Docker

Kubernetes

Terraform

Consigue la evaluación confidencial y gratuita de tu currículum.

o arrastra un archivo en formato PDF, DOC, DOCX, ODT o PAGES de hasta 5 MB.

Ciudades destacadas

Empresas destacadas

Vacantes populares