Overview
This is a remote position.
D.Engage is a leading SaaS company dedicated to delivering innovative solutions that drive digital engagement and enhance customer experiences. Our team is passionate about technology and committed to fostering an environment where talent can thrive and grow. We are looking for a SaaS Resilience Manager as part of our technology team, who are agile, results driven, customer obsessed and love learning.
This position provides a valuable opportunity for an engineer to enhance their expertise and contribute to impactful projects.
Responsibilities
- Resilience Planning and Strategy: Participate in developing and implementing a comprehensive service resilience strategy for all SaaS products.
- Resilience Planning and Strategy: Participate in designing and maintaining disaster recovery and business continuity plans.
- Resilience Planning and Strategy: Conduct regular risk assessments and impact analyses to identify vulnerabilities and mitigate risks.
- Ownership of Production Environment: Take ownership and be responsible for the production environment, including cloud and on‑premise infrastructure.
- Ownership of Production Environment: Monitoring of production environments in collaboration with the VP of Development.
- Ownership of Production Environment: Work with the VP of Security to ensure the security of the production environment.
- Team Building and Improvement: Build and lead a high-performing resilience team, continuously improving its quality.
- Team Building and Improvement: Train and improve the quality of technical support teams, including preparing training materials.
- Team Building and Improvement: Provide feedback to teams in problem detection and troubleshooting steps (logging, monitoring, health checks).
- Service Monitoring and Incident Management: Establish and manage robust monitoring systems to detect and respond to service disruptions promptly.
- Service Monitoring and Incident Management: Lead incident response efforts, including root cause analysis, resolution, and post-incident reviews.
- Service Monitoring and Incident Management: Develop and maintain incident response playbooks and procedures.
- Infrastructure and Performance Optimization: Collaborate with IT and engineering teams to design resilient infrastructure and applications.
- Infrastructure and Performance Optimization: Implement redundancy, failover, and load balancing strategies to ensure high availability.
- Infrastructure and Performance Optimization: Continuously monitor and optimize system performance, capacity, and scalability.
- Collaboration and Communication: Assist product and development teams in analysis when necessary.
- Collaboration and Communication: Analyze large-scale bugs and transfer them to the relevant teams.
- Collaboration and Communication: Troubleshoot problems over servers with teams when necessary.
- Collaboration and Communication: Provide regular updates on service resilience status, metrics, and improvements to stakeholders.
- Bug Fixing: Realize small-scale bug fixes (at least 3 year coding experience required).
- Bug Fixing: Analyze large-scale bugs and coordinate with relevant teams for resolution.
- Compliance and Documentation: Ensure compliance with relevant industry standards and regulations.
- Compliance and Documentation: Maintain comprehensive documentation of resilience strategies, processes, and incident responses.
- Compliance and Documentation: Participate in audits and reviews as required.
Requirements
- Bachelor\'s degree in Computer Science, Software Engineering, or a related field.
- Proficiency in .Net framework
- Strong knowledge of servers such AWS, Azure and Independent on-site server
- Familiarity with version control tools like Git.
- Experience with high complex L3 queries and solutions related to server and its scalability.
- Interest and enthusiasm for technology processes.
- Collaborative skills and a predisposition for teamwork.
- Must be accountable and committed to the job.
- Willingness to learn and adaptability to new technologies.
- Fast learning ability and problem-solving skills.
- Effective communication skills and analytical thinking ability.
Note :
- This role requires the candidate to speak Portuguese language.
- Submit your resumes in English language. Only English resumes will be acceptable.
Benefits
- Growth Opportunities: Access to professional development and career growth within a rapidly expanding SaaS company.
- Collaborative Culture: Work in a supportive and innovative environment where your contributions are valued.
- Competitive Benefits: Enjoy a comprehensive benefits package.
- Engage is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all team members.