This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Senior Site Reliability Engineer in EMEA.
This role offers an exciting opportunity to lead the design, deployment, and operation of complex, scalable lab environments that power hands‑on cybersecurity training and simulations. You will architect hybrid cloud and on‑premises infrastructure, including secure, high‑performance networks and lab environments that support thousands of concurrent users. This senior‑level position requires a combination of deep technical expertise, strategic leadership, and hands‑on engineering skills, enabling realistic and reliable training experiences. You will collaborate closely with platform architects, security researchers, and engineering teams to optimize performance, cost, and reliability. The role emphasizes automation, infrastructure‑as‑code, and observability, providing the chance to influence best practices across the organization while driving large‑scale infrastructure projects.
Accountabilities
- Design and architect complex global lab environments using OpenStack and hybrid cloud solutions, ensuring scalability, resilience, and security.
- Develop and implement infrastructure standards, patterns, and best practices for lab deployment.
- Optimize lab performance and resource utilization for thousands of concurrent users, including workspace‑based and private lab sessions.
- Lead infrastructure migrations, cost optimization initiatives, and high‑availability implementations.
- Implement automation using Infrastructure‑as‑Code tools (Terraform, Ansible) and create self‑service capabilities for teams.
- Establish observability, monitoring, logging, and disaster recovery procedures to maintain uptime and performance.
- Design network architectures, including VPNs, VLANs, software‑defined networking, and security controls appropriate for vulnerable lab environments.
- Collaborate with cross‑functional teams, provide mentorship, and contribute to architectural reviews and process improvements.
Requirements
- 4+ years of experience in SRE, Site Reliability Engineering, or Infrastructure Architecture roles.
- 2+ years in a senior or lead technical role with architectural responsibilities.
- Production experience with OpenStack and hybrid cloud platforms (AWS, Azure, GCP).
- Advanced knowledge of Linux and Windows Server environments.
- Expertise in networking, including TCP/IP, routing protocols, VPNs, firewalls, and network security.
- Proficiency in Infrastructure‑as‑Code tools (Terraform, CloudFormation, ARM templates) and configuration management.
- Experience with containerization and orchestration (Docker, Kubernetes).
- Proven track record in large‑scale distributed systems design, high availability, disaster recovery, and cost optimization.
- Strong strategic thinking, analytical problem‑solving, and proactive approach to infrastructure challenges.
- Excellent communication and leadership skills, with experience mentoring engineers and driving technical standards adoption.
- Background in cybersecurity, penetration testing, or vulnerability research environments is a plus.
Benefits
- Competitive compensation and flexible work arrangements.
- Fully remote work with flexible hours.
- Opportunity to work on large‑scale, high‑impact cybersecurity lab environments.
- Professional development and training opportunities.
- Dynamic and inclusive global team environment.
- Paid time off and home office support.