¡Activa las notificaciones laborales por email!
Mejora tus posibilidades de llegar a la entrevista
Join a forward-thinking company as a Site Reliability Engineer, where you'll enhance system reliability and performance. This role is pivotal in ensuring high availability and optimizing operational efficiency through automation and collaboration with development teams. You'll design and implement CI/CD pipelines, utilize Infrastructure as Code practices, and lead incident management efforts. If you're passionate about creating robust systems and improving user experiences, this is the perfect opportunity to make a significant impact in a dynamic environment.
Job Description. What is it?
Role Description (concept):
The purpose of the Site Reliability Engineer (SRE) role is to enhance and maintain the high availability and reliability of systems and applications, ensuring they effectively support business operations and contribute to a positive user experience. This role combines practices from software engineering and operations to create robust and efficient systems. Responsibilities include:
What does he/she do? (tasks):
Architecture:
- Participate in architecture decisions to ensure system resiliency from the start of software development.
Automation and Orchestration:
- Develop scripts and use tools to automate deployment, infrastructure provisioning, configuration management, and scaling, using CI/CD practices.
- Orchestrate workflows across environments to ensure consistency and reliability.
CI/CD:
- Design, implement, and manage CI/CD pipelines for rapid, reliable code deployment with minimal manual intervention, including automated testing.
Infrastructure as Code:
- Promote the use of IaC tools and practices for reproducible, scalable, and maintainable environments.
Monitoring, Logging, and Alerting:
- Implement monitoring and logging solutions to analyze performance data and generate alerts.
- Use observability data to proactively resolve issues, ensuring high availability.
Performance Optimization:
- Regularly assess and optimize system response times, resource use, and user satisfaction.
Incident Management and Reliability Engineering:
- Participate in on-call rotations, resolve incidents swiftly, and conduct post-mortems to prevent recurrence.
- Develop resilience and recovery strategies to meet SLOs.
Security and Compliance:
- Ensure adherence to security and compliance standards in all operations.
- Conduct security audits and address vulnerabilities.
Quality Assurance (QA):
- Support QA by setting up environments and deploying tools.
- Collaborate to automate testing and evaluate non-functional testing outcomes.
Responsibilities
Mandatory Skills:
Recommended Skills:
Soft Skills