Job Search and Career Advice Platform

¡Activa las notificaciones laborales por email!

Site Reliability Engineering (SRE)

Ant Group

Estepona

Presencial

EUR 50.000 - 70.000

Jornada completa

Hace 3 días
Sé de los primeros/as/es en solicitar esta vacante

Genera un currículum adaptado en cuestión de minutos

Consigue la entrevista y gana más. Más información

Descripción de la vacante

A leading technology company in Estepona, Spain, is seeking a motivated individual to enhance the reliability of payment systems. Responsibilities include leading technical initiatives, conducting incident response drills, and optimizing production issues. The ideal candidate has a strong background in Computer Science, proficiency in programming languages like Java or Python, and experience with cloud platforms. Join us to ensure high availability and stability in a dynamic work environment.

Formación

  • The role requires a solid knowledge of Computer Science principles.
  • Proficiency in at least one programming language like Java, Python, or Shell.
  • Experience with Google Cloud Platform or Oracle Cloud Infrastructure is a plus.

Responsabilidades

  • Lead technical initiatives to enhance reliability of payment systems.
  • Conduct routine drills for emergency response.
  • Analyze production issues to provide actionable insights.
  • Design infrastructure solutions for data centers.

Conocimientos

Problem-solving skills
Communication skills
Ownership mentality

Educación

Solid knowledge of Computer Science

Herramientas

Google Cloud Platform
Oracle Cloud Infrastructure
DPDI
Flink
AntSpark
OcenBase
Descripción del empleo
Overview
  • Ensuring Payment System Stability and High Availability: Lead technical initiatives to strengthen the reliability of our payment systems. This includes designing and implementing monitoring tools, logging frameworks, dashboards, diagnostic utilities, and disaster recovery plans. Conduct routine drills, develop contingency strategies, and participate in on-call rotations to ensure rapid response and resolution of production issues across regions.
  • Incident Handling and Emergency Response: Conduct routine drills, develop contingency strategies, and participate in on-call rotations to ensure rapid response and resolution of production issues.
  • Analyze and Optimize Production Issues: Investigate and analyze real-world production cases, such as performance bottlenecks or system inefficiencies, to derive actionable insights and establish technical best practices. Contribute to the evolution of a highly available and resilient payment architecture.
  • Design and Implement Infrastructure Solutions: Architect and set up new Internet Data Centers (IDCs) to meet scalability and performance requirements. Develop and execute comprehensive data protection plans that adhere to industry standards and compliance requirements, ensuring data integrity and security.
Technical Requirements
  • Solid knowledge of Computer Science, and familiar with the principles of Operating System (Unix/Linux), Computer Storage, Computer Networking and other related principles.
  • Proficient in at least one programming language, such as Java/Python/Shell with experience in developing operations and maintenance tools.
  • The strong ability to resolve system problems, good communication skills and a sense of ownership.
  • Experiences in operating Google Cloud Platform (GCP) / Oracle Cloud Infrastructure(OCI), OLAP platform (like DPDI, Flink, AntSpark), OcenBase (OB), Ant Trust-Native Service (ATS)is a plus.
Consigue la evaluación confidencial y gratuita de tu currículum.
o arrastra un archivo en formato PDF, DOC, DOCX, ODT o PAGES de hasta 5 MB.