¡Activa las notificaciones laborales por email!

Site Reliability Engineer - Fixed Term Contract

MMT

Barcelona

Presencial

EUR 50.000 - 70.000

Jornada completa

Hace 7 días

Sé de los primeros/as/es en solicitar esta vacante

Descripción de la vacante

A leading technology company in Barcelona is seeking an experienced Site Reliability Engineer to enhance automation and optimize cloud infrastructure. Responsibilities include ensuring system reliability, leading incident responses, and collaborating closely with development teams. Candidates should have expertise in automation tools and cloud platforms, particularly Azure and AWS. This role offers an opportunity to work in a dynamic environment focused on scalability and performance improvements.

Formación

Proven experience in running and maintaining production systems.
Expertise in triaging and solving incidents.
Proficiency in automation and configuration management tools.

Responsabilidades

Own the uptime and performance of critical infrastructure and applications.
Design scalable, fault-tolerant architectures.
Lead incident response and conduct root cause analyses.

Conocimientos

Automation

System architecture

Problem-solving

Cloud platforms knowledge

Herramientas

Terraform

Docker

Kubernetes

GitHub Actions

Azure DevOps

Datadog

Overview

The Role

We are seeking an experienced Site Reliability Engineer to play a pivotal role in bridging the gap between software engineering and operations. This role emphasizes designing robust solutions, mentoring teams, and driving performance improvements for both internal and client systems through expertise in automation, scalability, and system reliability. As a Site Reliability Engineer, you will be responsible for owning the uptime and performance of critical infrastructure and applications while working closely with clients to align reliability goals with their business objectives.

Responsibilities

System Reliability & Performance

Own the uptime and performance of critical infrastructure and applications
Design scalable, fault-tolerant architectures that meet business needs while maximizing operational efficiency
Define and govern Non-Functional Requirements (NFRs) such as availability, performance, and maintainability for internal and client systems
Proactively identify opportunities for system optimization, scalability, and cost reduction

Automation & Infrastructure

Design and implement automation for monitoring, incident response, and repetitive operational tasks
Implement Infrastructure as Code (IaC) practices with tools like Terraform, ARM templates and CloudFormation for consistent cloud environment provisioning
Design and deploy containerized solutions using Docker and Kubernetes on cloud platforms
Set up and manage CI / CD pipelines and frameworks tailored for cloud-native applications using GitHub Actions and Azure DevOps

Cloud Infrastructure & Operations

Participate in architectural decisions and implement cloud infrastructure solutions using Azure and AWS services, ensuring high availability and scalability
Manage and optimize cloud resources to improve performance, cost efficiency, and security
Apply cloud-native best practices to secure and govern cloud environments, ensuring compliance with industry standards
Integrate advanced monitoring and alerting tools (e.g., Datadog, CloudWatch, Application Insights) to maintain system observability in multi-cloud environments

Incident Management & Analysis

Lead incident response, conduct root cause analyses, and produce blameless postmortems to prevent future occurrences
Collaborate with development teams to integrate observability and performance metrics into the development lifecycle
Build and maintain executive-level and developer-centric dashboards to visualize key metrics

Technical Expertise

Proven experience in running and maintaining production systems with expertise in triaging and solving incidents
Proficiency in automation and configuration management tools (e.g., Terraform, Ansible)
Expertise in cloud platforms, particularly Azure and AWS, and their associated tools
Strong programming skills, with a primary focus on Python, for developing automation scripts, creating custom tooling, and optimizing operational workflows
Experience with modern observability platforms such as Datadog

Skills & Experience

A solid foundation in system architecture, with a focus on scalability and reliability
Exceptional problem-solving skills and a data-driven mindset

Desirable Requirements

Experience with container orchestration tools such as Kubernetes
Familiarity with CI / CD pipelines and tools like GitHub Actions and Azure DevOps
Knowledge of security best practices in cloud and hybrid environments

Consigue la evaluación confidencial y gratuita de tu currículum.

o arrastra un archivo en formato PDF, DOC, DOCX, ODT o PAGES de hasta 5 MB.

Site Reliability Engineer - Fixed Term Contract

MMT

Barcelona

Presencial

EUR 50.000 - 70.000

Jornada completa

Descripción de la vacante

Formación

Responsabilidades

Conocimientos

Herramientas

Empresa

Servicios

Recursos gratuitos

Ayuda

Site Reliability Engineer - Fixed Term Contract

MMT

Barcelona

Presencial

EUR 50.000 - 70.000

Jornada completa

Descripción de la vacante

Formación

Responsabilidades

Conocimientos

Herramientas

Síguenos

Empresa

Servicios

Recursos gratuitos

Ayuda