¡Activa las notificaciones laborales por email!

Site Reliability Engineer

Verisure

España

Presencial

EUR 40.000 - 80.000

Jornada completa

Hace 30+ días

Mejora tus posibilidades de llegar a la entrevista

Elabora un currículum adaptado a la vacante para tener más posibilidades de triunfar.

Descripción de la vacante

Join a forward-thinking company as a Site Reliability Engineer, where you'll enhance system reliability and performance. This role is pivotal in ensuring high availability and optimizing operational efficiency through automation and collaboration with development teams. You'll design and implement CI/CD pipelines, utilize Infrastructure as Code practices, and lead incident management efforts. If you're passionate about creating robust systems and improving user experiences, this is the perfect opportunity to make a significant impact in a dynamic environment.

Formación

Advanced skills in monitoring, logging, and observability.
Proficiency in Python and Bash for automation.

Responsabilidades

Design and build systems using automation and develop scripts.
Lead incident management and conduct post-mortems.

Conocimientos

Monitoring, Logging, Observability

Automation (Python, Bash)

Configuration as Code (Ansible)

Containerization (Docker, Kubernetes)

Database Management

Version Control (Git)

Herramientas

Terraform

Java

Spring Boot

Job Description. What is it?

Role Description (concept):

The purpose of the Site Reliability Engineer (SRE) role is to enhance and maintain the high availability and reliability of systems and applications, ensuring they effectively support business operations and contribute to a positive user experience. This role combines practices from software engineering and operations to create robust and efficient systems. Responsibilities include:

Enhancing system availability and reliability.
Managing incidents proactively to minimize downtime.
Optimizing system performance and scalability.
Implementing automation to improve operational efficiency.
Collaborating with security teams to enhance system protection.
Developing disaster recovery strategies.
Maintaining documentation for knowledge sharing.
Working with development teams to embed reliability in design.
Continuously evaluating and optimizing system performance and processes.
Supporting business growth through reliable infrastructure.

What does he/she do? (tasks):

Architecture:

- Participate in architecture decisions to ensure system resiliency from the start of software development.

Automation and Orchestration:

- Develop scripts and use tools to automate deployment, infrastructure provisioning, configuration management, and scaling, using CI/CD practices.

- Orchestrate workflows across environments to ensure consistency and reliability.

CI/CD:

- Design, implement, and manage CI/CD pipelines for rapid, reliable code deployment with minimal manual intervention, including automated testing.

Infrastructure as Code:

- Promote the use of IaC tools and practices for reproducible, scalable, and maintainable environments.

Monitoring, Logging, and Alerting:

- Implement monitoring and logging solutions to analyze performance data and generate alerts.

- Use observability data to proactively resolve issues, ensuring high availability.

Performance Optimization:

- Regularly assess and optimize system response times, resource use, and user satisfaction.

Incident Management and Reliability Engineering:

- Participate in on-call rotations, resolve incidents swiftly, and conduct post-mortems to prevent recurrence.

- Develop resilience and recovery strategies to meet SLOs.

Security and Compliance:

- Ensure adherence to security and compliance standards in all operations.

- Conduct security audits and address vulnerabilities.

Quality Assurance (QA):

- Support QA by setting up environments and deploying tools.

- Collaborate to automate testing and evaluate non-functional testing outcomes.

Responsibilities

Design, build, and scale systems using automation; develop automation scripts.
Lead incident management, conduct post-mortems, and develop preventive strategies.
Define and monitor reliability metrics; analyze data for improvements.
Collaborate with development teams to ensure reliability from design phase; promote SRE principles.
Lead capacity planning and scalability strategies.
Identify inefficiencies and champion new technologies.
Strengthen system security through initiatives and vulnerability management.

Mandatory Skills: