We are seeking a highly skilled Senior Site Reliability Engineer to join our infrastructure team. This role requires deep expertise in modern infrastructure automation, networking, and security practices. The ideal candidate will have extensive experience with containerization, infrastructure as code, and advanced networking concepts including DNS management and domain fronting techniques.
Senior-Level
Remote
Job Description
We are seeking a highly skilled Senior Site Reliability Engineer to join our infrastructure team. This role requires deep expertise in modern infrastructure automation, networking, and security practices. The ideal candidate will have extensive experience with containerization, infrastructure as code, and advanced networking concepts including DNS management and domain fronting techniques.
Requirements
- Design, build, and maintain robust CI/CD pipelines from development to production
- Automate application builds, testing, and deployment workflows
Infrastructure Management & Automation
- Design, implement, and maintain scalable infrastructure using Infrastructure as Code principles
- Automate deployment, configuration, and management of cloud or on-premise infrastructure
- Manage and optimize containerized applications and orchestration platforms
- Implement and maintain CI/CD pipelines for reliable software delivery
Networking & Security
- Design and implement robust networking solutions including load balancing, reverse proxies, and traffic management
- Configure and manage DNS infrastructure, domain routing, and advanced networking techniques
- Implement security best practices across infrastructure layers
- Monitor and respond to security incidents and network anomalies
System Reliability & Performance
- Ensure high availability and reliability of production systems
- Implement comprehensive monitoring, alerting, and observability solutions
- Conduct capacity planning and performance optimization
- Lead incident response and post-mortem analysis
- Plan and execute complex infrastructure migrations between cloud providers
- Evaluate and implement new cloud services and technologies
- Optimize cloud costs while maintaining performance and reliability
Skills & Experience
Core Infrastructure Technologies
- Traefik: Expert-level knowledge of configuration, routing rules, middleware, and advanced features
- Docker: Deep understanding of containerization, multi-stage builds, networking, and security best practices
- Terraform: Advanced Infrastructure as Code implementation, module development, and state management
- Ansible: Extensive automation experience including playbook development, roles, and complex orchestration
Networking & Security Expertise
- DNS Management: Advanced DNS configuration, zone management, and troubleshooting
- IP Networking: Deep understanding of TCP/IP, subnetting, VLANs, and network protocols
- Domain Fronting: Knowledge of CDN-based domain fronting techniques and traffic obfuscation
- Network Security: Firewall configuration, VPN setup, network segmentation, and security hardening
Programming & Development
- Proficiency in at least one programming language (Python, Go, Bash, or similar)
- Experience with API development and integration
- Understanding of software development lifecycle and DevOps practices
- Version control systems (Git) and collaborative development workflows
- Kubernetes: Container orchestration, cluster management, and cloud-native applications
- Cloud Platforms: Experience with major cloud providers (AWS, GCP, Azure, Hetzner, etc.)
- Linux Systems: Advanced system administration and troubleshooting
- Monitoring & Observability: Prometheus, Grafana, ELK stack, or similar tools
Experience
- 5+ years of experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering
- 3+ years of hands-on experience with containerization and orchestration platforms
- 3+ years of experience with Infrastructure as Code tools
Hard Skills
- Expert-level knowledge of Linux/Unix systems administration
- Strong understanding of networking protocols and security principles
- Experience with configuration management and automation tools
- Proficiency in scripting and at least one programming language
- Experience with incident response and troubleshooting complex systems
Soft Skills
- Strong analytical and problem-solving abilities
- Excellent communication skills for technical documentation and cross-team collaboration
- Ability to work independently and manage multiple priorities
- Experience mentoring junior team members
- Strong attention to detail and commitment to operational excellence
- Being a part of a global startup with hyper-growth.
- Exceptional, innovative and dynamic work environment
- Full transparency and open employee communication
- Tremendous growth & career advancement opportunities