Overview
Nexthink is looking for a Lead Site Reliability Engineer who is passionate about building and running a high-performance cloud platform and enabling best-in-class site reliability and operations practices. This role will support Nexthink operations globally. The candidate will drive the development of modern, cloud-native SRE processes and the management and operations for Nexthink’s multi-tenant, microservices-based cloud platform. The platform has multiple instances deployed across the globe.
This role involves working closely with cross-functional teams to integrate reliability and security into our systems, ensuring they meet standards. The ideal candidate will have extensive experience in both software engineering and systems administration, with a strong understanding of SRE concepts, requirements and security practices.
Responsibilities
- Lead, mentor, and develop a team of India-based Site Reliability Engineers.
- Foster a culture of continuous improvement, collaboration, and innovation.
- Oversee the design, deployment, and management of scalable and secure cloud infrastructure.
- Drive automation of infrastructure provisioning, configuration, and management using Infrastructure as Code (IaC) tools.
- Develop and maintain comprehensive monitoring, logging, and alerting systems to ensure high availability and performance.
- Lead efforts in performance tuning and optimization for applications and infrastructure.
- Ensure implementation and maintenance of security controls and best practices to achieve compliance with standards and certifications.
- Conduct and oversee regular security assessments, vulnerability scans, and penetration testing.
- Collaborate with the compliance team to prepare for and respond to audits.
- Lead incident management efforts, ensuring rapid resolution and thorough root cause analysis.
- Develop and implement strategies for improving incident response and minimizing downtime.
- Work closely with development, operations, and security teams to integrate reliability and security into the software development lifecycle.
- Communicate effectively with stakeholders, providing regular updates on system performance, reliability, and compliance status.
Qualifications
- Bachelor’s degree in Computer Science, Engineering, or a related field (or equivalent experience).
- 5+ years of experience in site reliability engineering, DevOps, or a related role, with at least 2 years in a leadership position.
- Proficiency in cloud platforms (AWS, Azure, GCP) and cloud-native services.
- Strong scripting and programming skills (Python, Bash, Go, or similar).
- Experience with Infrastructure as Code (IaC) tools such as Terraform, CrossPlane, CloudFormation, or Ansible.
- Knowledge of containerization and orchestration (Docker, Kubernetes).
- Familiarity with CI/CD pipelines and tools (Jenkins, GitLab, GitHub, etc.).
- In-depth knowledge of standards (ISO, SOC2...) requirements and best practices.
- Experience with security tools and practices (SIEM, IDS/IPS, firewalls).
- Understanding of network security, encryption, and secure software development practices.
- Ability to collaborate with and foster effective communication with global and multicultural engineering teams in EU and US timezones.
- Ability to report timely and effectively to the upper engineering management.
Benefits
- Permanent Contract and a competitive compensation package (including stock options).
- Hybrid work model balancing office and remote work, with a structured approach for new hires to foster connections and onboarding.
- ️ Flexible Hours and unlimited vacation (employees have unlimited paid time off on top of the 22days of holidays we offer)plus 3 company-paid volunteer days.
- Fresh fruit, cookies, and soft drinks as well.
- Regular company and team events like Voluntary Days, Pizza talks, Team Building activities, hosting Meetups at the office and more!
- Bonuses for referring successful hires after three months of continuous employment.
Additional information: Please note that not all the benefits listed above are available for temporary, contract, and internship roles. To ensure you have the most up-to-date information, we recommend checking with your Recruitment Partner.