¡Activa las notificaciones laborales por email!

Principal Site Reliability Engineer

Playson

España

A distancia

EUR 70.000 - 90.000

Jornada completa

Hoy
Sé de los primeros/as/es en solicitar esta vacante

Descripción de la vacante

A leading iGaming supplier is seeking a Principal Site Reliability Engineer to manage systems health, provide 24x7 on-call support, and integrate technologies into their Cloud Infrastructure. The ideal applicant will have strong experience in Kubernetes, AWS, and monitoring tools, and will thrive in a proactive role focusing on high-traffic online platforms. This position offers a flexible work schedule and comprehensive benefits including unlimited paid vacation.

Servicios

Quarterly Bonuses
Flexible Work Schedule
Remote Work Option
Comprehensive Medical Insurance
Financial Support for Life Events
Unlimited Paid Vacation
Unlimited Paid Sick Leave
Reimbursement for professional development courses

Formación

  • Proficiency in Kubernetes (deployment, scaling, troubleshooting).
  • Experience with configuration management tools like FluxCD/ArgoCD.
  • Strong experience with issue processing (RCA, Postmortems).
  • Familiarity with AWS, Terraform, Docker, CI/CD.
  • Experience with monitoring tools like DataDog, Prometheus, Grafana, and ELK Stack.
  • Strong understanding of networking concepts and protocols.
  • Proficiency in at least one scripting language (Python, NodeJS, Go).
  • Proficiency in Git or other version control systems.
  • Familiarity with incident response tools like PagerDuty, Opsgenie.

Responsabilidades

  • Manage day-to-day alerts and system checks.
  • Provide 24x7 on-call support for critical SaaS events.
  • Document issues and remediation steps.
  • Create monitors within the EKS/K8s ecosystem.
  • Deploy to EKS/K8s cluster using Terraform and Helm/Flux.
  • Enhance infrastructure health with checks and scripts.
  • Maintain and develop deployment code.
  • Integrate new technologies into Cloud Infrastructure.
  • Collaborate with teams for support and assistance.
  • Prioritize customer focus in deployments/updates.
  • Conduct RCA and corrective actions to prevent issues.
  • Assign alert-related actions after investigation.
  • Handle support requests for environment-specific actions.

Conocimientos

Kubernetes proficiency
Configuration management tools (FluxCD/ArgoCD)
Experience with issue processing (RCA, Postmortems)
AWS familiarity
Terraform experience
Docker knowledge
CI/CD expertise
Monitoring tools (DataDog, Prometheus, Grafana)
Networking concepts understanding
Scripting language proficiency (Python, NodeJS, Go)
Git knowledge
Incident response tools familiarity (PagerDuty, Opsgenie)
Descripción del empleo

Founded in 2012, Playson is a leading iGaming supplier recognized worldwide. We provide our customers with a high-end micro-service-based platform as a service that aims to process billions of financial transactions per day. We provide a cross-regional setup and are chasing latency reduction down to zero. We highly invest in delivering the best game experience and smooth connection regardless of the internet coverage and bandwidth of the game clients.

We are currently seeking an experienced Principal Site Reliability Engineer to join our dynamic Platform Tribe.

What will you be doing:
  • Manage day-to-day alerts, system checks, and issue escalation as necessary.
  • Provide 24x7 on-call support for critical SaaS events.
  • Document issues and remediation steps.
  • Proactively create monitors within the EKS/K8s ecosystem.
  • Deploy to EKS/K8s cluster using Terraform and Helm/Flux.
  • Enhance infrastructure health by implementing checks and scripts to address known issues.
  • Maintain and develop deployment code.
  • Implement/integrate new technologies into our Cloud Infrastructure.
  • Collaborate with other teams to provide top-notch support and assistance.
  • Prioritize customer focus in planning deployments/updates, ensuring minimal impact.
  • Conduct RCA and take necessary corrective actions to prevent issue recurrence.
  • Assign alert-related actions to the appropriate team after investigation.
  • Handle support requests for environment-specific actions.
To succeed in this role, you will need:
  • Proficiency in Kubernetes (deployment, scaling, troubleshooting).
  • Experience with configuration management tools like FluxCD/ArgoCD.
  • Strong experience with issue processing (RCA, Postmortems).
  • Familiarity with AWS, Terraform, Docker, CI/CD.
  • Experience with monitoring tools like DataDog, Prometheus, Grafana, and logging solutions like Elasticsearch, Logstash, and Kibana (ELK Stack) or AWS CloudWatch.
  • Strong understanding of networking concepts and protocols.
  • Proficiency in at least one scripting language (e.g., Python, NodeJS, Go).
  • Proficiency in Git or other version control systems.
  • Familiarity with incident response and management tools like PagerDuty, Opsgenie, or VictorOps.
  • Ownership, proactiveness, persistence, and passion for maintaining a high-traffic online platform.
What We Offer:
  • Quarterly Bonuses based on transparent and systematic evaluation.
  • Flexible Work Schedule.
  • Remote Work Option for Enhanced Flexibility.
  • Comprehensive Medical Insurance for you and your significant other.
  • Financial Support for Life Events.
  • Unlimited Paid Vacation.
  • Unlimited Paid Sick Leave.
  • Reimbursement for professional development courses and training.

If you\'re ready to embrace ambitious goals and thrive in a dynamic environment, Apply now and become part of Playson\'s exciting journey in the iGaming world!

Consigue la evaluación confidencial y gratuita de tu currículum.
o arrastra un archivo en formato PDF, DOC, DOCX, ODT o PAGES de hasta 5 MB.