Site Reliability Engineer

Solo para miembros registrados

Albacete

EUR 40.000 - 70.000

Descripción del empleo

As a Site Reliability Engineer (SRE) on our Platform team, you’ll be responsible for designing, implementing, and maintaining SignalWire’s applications and services. You will work closely with internal engineering teams and directly with customers when necessary to diagnose, debug, and resolve critical platform issues. If you’re passionate about automation, observability, operational excellence, and customer-focused problem solving, we want to hear from you.

Position Responsibilities

Build and support reliable, scalable, and secure infrastructure to support our applications and services.
Collaborate directly with customers as needed, including joining customer-facing calls to troubleshoot, debug, and resolve urgent or complex issues.
Monitor system health and performance to identify and address potential issues before they impact customers.
Utilize Infrastructure as Code (IaC) tools like Terraform to automate the provisioning and management of cloud resources.
Design, implement, and maintain secure and reliable network architectures within our cloud environment.
Develop observability solutions (logging, metrics, tracing) to provide insights for developers and operations.
Design and implement containerized application architectures using Docker and Kubernetes.
Participate in on-call rotations and lead incident response for critical issues.
Support and triage networking, cloud, or container-based issues.
Meet delivery commitments, communicate progress, and take ownership of the outcomes.
Support your team members in meeting their commitments, ensuring code quality and reliability.

Qualifications :

3+ years of experience in Site Reliability Engineering, DevOps, or Platform Engineering.
Experience with containerization (Docker) and orchestration (Docker Swarm, Kubernetes)
Experience with infrastructure-as-code (Terraform, Pulumi) and configuration management (Ansible).
Solid grasp of observability, with experience in setting up monitoring, logging, and tracing.
Knowledge of networking protocols, security best practices, and Linux system administration.
Understanding of cloud platforms (GCP, AWS, Azure) and experience with multi-cloud deployments.
Familiarity with building and maintaining CI / CD pipelines with GitHub Actions.
Experience programming in Python, Ruby, Shell, Rust, Elixir, or others.
Familiarity with a variety of technologies we use or similar, such as Docker Swarm, Grafana, Loki, ELK, Clickhouse, Rails, Node.js, Redis, and Postgresql.
Experience working with WebRTC and VoIP servers and services a plus
Commitment to collaboration, learning, sharing knowledge, and contributing to the team culture.

J-18808-Ljbffr