¡Activa las notificaciones laborales por email!

Lead SRE Engineer

Screening Eagle Technologies

Málaga

Presencial

EUR 50.000 - 70.000

Jornada completa

Hace 30+ días

Mejora tus posibilidades de llegar a la entrevista

Elabora un currículum adaptado a la vacante para tener más posibilidades de triunfar.

Descripción de la vacante

An innovative firm is seeking a Site Reliability Engineer Lead to guide a talented team in ensuring the stability and scalability of cloud services. This role offers the opportunity to leverage cutting-edge technologies and practices in AWS, Terraform, and Kubernetes, while leading efforts in automation, testing, and engineering. You will play a crucial role in enhancing monitoring systems, optimizing resources, and driving process improvements. If you are passionate about technology and thrive in a collaborative environment, this is the perfect opportunity to make a significant impact on the company's success.

Formación

5+ years of experience in AWS cloud infrastructure development.
Expert-level proficiency in Terraform and Kubernetes.

Responsabilidades

Lead a team of SREs to ensure service stability and scalability.
Design and implement cloud infrastructure while optimizing costs.

Conocimientos

AWS Cloud Infrastructure

Terraform

Kubernetes

DevOps Practices

Non-Functional Testing

Git and GitOps

Monitoring Tools (ELK, Prometheus, Grafana)

MLOps

Cost Optimization

Agile Methodologies

Herramientas

AWS (EC2, S3, VPC, IAM)

CI/CD Pipelines

Logging and Monitoring Tools

The Site Reliability Engineer Lead (SRE Lead) at Screening Eagle will lead a team of SREs to ensure the stability, resilience, and scalability of our services through automation, testing, and engineering. This role involves leveraging expertise from product systems / operations, cloud infrastructure (AWS), build and release engineering, software development, and stress / load testing to guarantee our services are available, cost-efficient, and fit for purpose from the early stages of development. 5+ years of experience developing AWS cloud infrastructure and 7+ years of experience leading teams.

What will you do

Cloud Infrastructure Management and Networking

Design, develop, and implement cloud infrastructure using Terraform.
Optimize resources for cost-efficiency and performance.
Ensure infrastructure security and implement service control policies (e.g., Control Tower).
Configure AWS VPC flow logs, load balancer logging, Direct Connect, AWS VPN, TGX, etc.

Monitoring, Support, and Prototyping

Implement robust monitoring and alerting systems.
Set up and monitor CI / CD pipelines both on-premises and in the cloud.
Enhance monitoring, logging, and alerting practices.
Use tagging and cost categorization for cost analysis.
Create prototypes and lead development teams in implementing solutions.

Team Leadership, Collaboration, and Documentation

Lead the SRE team, ensuring technical quality and best practices.
Guide the team through the software development lifecycle.
Collaborate with developers and operations to integrate infrastructure changes.
Document DevOps changes, technical partnerships, design, integration, testing, and deployment.

Innovation, Quality Assurance, and Process Improvement

Evaluate risks, customize applications, and lead quality practices.
Focus on agile methodologies, test automation, and continuous integration.
Simplify and automate complex processes to ensure quality and operational excellence.
Improve the DevOps toolchain and streamline software delivery processes.
Stop projects / products if solutions are not technically acceptable.

What do we expect

Extensive experience in implementing and evolving DevOps practices across multi-disciplinary teams and business frameworks.
Strong background in leading technology change programs and managing projects.
In-depth knowledge and experience with AWS services (EC2, S3, VPC, IAM, etc.).
Expert-level proficiency in Terraform, including writing reusable modules and leveraging best practices.
Highly skilled with Kubernetes, Terraform, serverless and AWS in general.
Proficient in non-functional testing, including performance, security, and cost optimization.
Experience working with advanced architectures such as ARM and AWS Graviton, optimizing for performance, cost-efficiency, and scalability.
Knowledge of K8S operator programming and those related with GPU based architectures
Competent in using different arch build tools and practices.
Expertise in Git and GitOps philosophy.
Expert in logging and monitoring tools like ELK, Prometheus, and Grafana.
Demonstrable MLOps experience.
Ability to quickly gain domain knowledge.
Operational experience in maintaining applications.