¡Activa las notificaciones laborales por email!

Senior Site Reliability Engineer

Playnetic

Madrid

A distancia

EUR 40.000 - 75.000

Jornada completa

Hace 30+ días

Descripción de la vacante

Playnetic is seeking a Site Reliability Engineer for its remote gaming entertainment team. The role involves ensuring system reliability, maintaining monitoring and alerting systems, and automating processes for operational stability. Candidates should have at least 3 years of experience in Linux and AWS, with proficiency in observability tools like Prometheus and Grafana. Join a fast-growing B2B iGaming startup and contribute to powering the future of iGaming technology.

Formación

  • 3+ years of experience working with Linux and AWS environments.
  • Hands-on experience with observability tools, and strong skills in containerization and orchestration.
  • Excellent communication and collaboration skills.

Responsabilidades

  • Own and operate observability and monitoring stack to ensure system reliability.
  • Design and maintain monitoring, alerting, and logging systems; automate repetitive tasks.
  • Lead incident response and document operational procedures.

Conocimientos

Linux
AWS
Prometheus
Grafana
Python
Bash
Terraform
Kubernetes
Descripción del empleo

Established in 2023, Playnetic is a new player in the world of gaming entertainment. We design and build slot games from scratch - from idea to release. Our games will be played in regulated markets globally through industry-leading operators.

Our innovative gaming content is centred around our core values : quality gaming, dedicated customer service, and reliable delivery to our partners.

As a fully remote studio, we understand the importance of staying connected and maintaining team collaboration. We are continually mindful of this as we grow and have the tools and support in place to help us all flourish.

As afast growing B2B iGaming startup, ourmission is topower the future of iGaming with scalable, secure, and lightning-fast technology.We’re passionate about reliability, speed, and user experience and we need an SRE who shares our obsession.

Your Mission

As a Site Reliability Engineer , you will own and operate our full observability and monitoring stack, ensuring our systems are reliable, scalable, and performant. You will collaborate closely with development and operations teams to automate processes, reduce manual toil, and implement engineering-driven solutions that balance innovation velocity with operational stability.

What You’ll Do

  • Design, implement, and maintain monitoring, alerting, and logging systems (Prometheus, VictoriaMetrics, Grafana, OpenSearch, Dynatrace).
  • Define and track Service Level Indicators (SLIs), Service Level Objectives (SLOs), and manage error budgets to measure and improve system reliability.
  • Automate repetitive tasks and build self-healing infrastructure using scripting (Bash, Python) and infrastructure-as-code tools (Terraform, Terragrunt).
  • Ensure Kubernetes (EKS) cluster reliability through health checks, graceful shutdowns, rolling updates, and autoscaling.
  • Develop and maintain CI / CD pipelines using GitLab and Helm charts.
  • Lead incident response, conduct blameless postmortems, and implement preventive measures.
  • Document operational procedures, runbooks, and observability logic; train internal teams on best practices.
  • Participate in 24 / 7 on-call rotations to maintain service availability.

What You’ll Bring

  • 3+ years of experience working with Linux and AWS environments (AWS certifications a plus).
  • Hands-on experience with observability tools : Prometheus, Grafana, OpenSearch / ELK, VictoriaMetrics, Dynatrace.
  • Familiarity with messaging and database technologies such as Kafka, RabbitMQ, PostgreSQL, Cassandra, Redis, Elasticsearch.
  • Strong skills in containerization and orchestration : Docker, Kubernetes (EKS), Helm.
  • Proficient scripting skills in Bash and Python; experience with Terraform and Terragrunt for infrastructure automation.
  • Solid understanding of CI / CD processes, preferably with GitLab.
  • Knowledge of SRE principles including SLIs / SLOs, error budgets, capacity planning, and incident management.
  • Excellent communication skills and ability to collaborate across teams.
  • English proficiency at intermediate level or higher.
  • Consigue la evaluación confidencial y gratuita de tu currículum.
    o arrastra un archivo en formato PDF, DOC, DOCX, ODT o PAGES de hasta 5 MB.