Enable job alerts via email!

Senior Site Reliability Engineer

Heli

United States

Remote

USD 120,000 - 180,000

Full time

Yesterday
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Start fresh or import an existing resume

Job summary

A leading company in the technology sector is seeking a Senior Site Reliability Engineer to join their remote infrastructure team. The ideal candidate should have deep expertise in automation, networking, and security practices, with hands-on experience in containerization and Infrastructure as Code. If you thrive in a dynamic and innovative environment, apply now to take your career to new heights.

Benefits

Exceptional growth & career advancement opportunities
Innovative and dynamic work environment
Full transparency and open communication

Qualifications

  • 5+ years of experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering.
  • 3+ years of hands-on experience with containerization and orchestration platforms.
  • 3+ years of experience with Infrastructure as Code tools.

Responsibilities

  • Design, build, and maintain robust CI/CD pipelines.
  • Implement security best practices across infrastructure layers.
  • Lead incident response and post-mortem analysis.

Skills

Expert-level knowledge of Linux/Unix systems administration
Strong analytical and problem-solving abilities
Excellent communication skills
Containerization
Infrastructure as Code
Networking protocols

Tools

Docker
Terraform
Ansible
Kubernetes
Prometheus
Grafana

Job description

We are seeking a highly skilled Senior Site Reliability Engineer to join our infrastructure team. This role requires deep expertise in modern infrastructure automation, networking, and security practices. The ideal candidate will have extensive experience with containerization, infrastructure as code, and advanced networking concepts including DNS management and domain fronting techniques.

Senior-Level

Remote

Job Description

We are seeking a highly skilled Senior Site Reliability Engineer to join our infrastructure team. This role requires deep expertise in modern infrastructure automation, networking, and security practices. The ideal candidate will have extensive experience with containerization, infrastructure as code, and advanced networking concepts including DNS management and domain fronting techniques.

Requirements
  • Design, build, and maintain robust CI/CD pipelines from development to production
  • Automate application builds, testing, and deployment workflows
Infrastructure Management & Automation
  • Design, implement, and maintain scalable infrastructure using Infrastructure as Code principles
  • Automate deployment, configuration, and management of cloud or on-premise infrastructure
  • Manage and optimize containerized applications and orchestration platforms
  • Implement and maintain CI/CD pipelines for reliable software delivery
Networking & Security
  • Design and implement robust networking solutions including load balancing, reverse proxies, and traffic management
  • Configure and manage DNS infrastructure, domain routing, and advanced networking techniques
  • Implement security best practices across infrastructure layers
  • Monitor and respond to security incidents and network anomalies
System Reliability & Performance
  • Ensure high availability and reliability of production systems
  • Implement comprehensive monitoring, alerting, and observability solutions
  • Conduct capacity planning and performance optimization
  • Lead incident response and post-mortem analysis
  • Plan and execute complex infrastructure migrations between cloud providers
  • Evaluate and implement new cloud services and technologies
  • Optimize cloud costs while maintaining performance and reliability
Skills & Experience
Core Infrastructure Technologies
  • Traefik: Expert-level knowledge of configuration, routing rules, middleware, and advanced features
  • Docker: Deep understanding of containerization, multi-stage builds, networking, and security best practices
  • Terraform: Advanced Infrastructure as Code implementation, module development, and state management
  • Ansible: Extensive automation experience including playbook development, roles, and complex orchestration
Networking & Security Expertise
  • DNS Management: Advanced DNS configuration, zone management, and troubleshooting
  • IP Networking: Deep understanding of TCP/IP, subnetting, VLANs, and network protocols
  • Domain Fronting: Knowledge of CDN-based domain fronting techniques and traffic obfuscation
  • Network Security: Firewall configuration, VPN setup, network segmentation, and security hardening
Programming & Development
  • Proficiency in at least one programming language (Python, Go, Bash, or similar)
  • Experience with API development and integration
  • Understanding of software development lifecycle and DevOps practices
  • Version control systems (Git) and collaborative development workflows
  • Kubernetes: Container orchestration, cluster management, and cloud-native applications
  • Cloud Platforms: Experience with major cloud providers (AWS, GCP, Azure, Hetzner, etc.)
  • Linux Systems: Advanced system administration and troubleshooting
  • Monitoring & Observability: Prometheus, Grafana, ELK stack, or similar tools
Experience
  • 5+ years of experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering
  • 3+ years of hands-on experience with containerization and orchestration platforms
  • 3+ years of experience with Infrastructure as Code tools
Hard Skills
  • Expert-level knowledge of Linux/Unix systems administration
  • Strong understanding of networking protocols and security principles
  • Experience with configuration management and automation tools
  • Proficiency in scripting and at least one programming language
  • Experience with incident response and troubleshooting complex systems
Soft Skills
  • Strong analytical and problem-solving abilities
  • Excellent communication skills for technical documentation and cross-team collaboration
  • Ability to work independently and manage multiple priorities
  • Experience mentoring junior team members
  • Strong attention to detail and commitment to operational excellence
  • Being a part of a global startup with hyper-growth.
  • Exceptional, innovative and dynamic work environment
  • Full transparency and open employee communication
  • Tremendous growth & career advancement opportunities
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.