Site Reliability Engineer - Remote / Telecommute
Cynet Systems Inc
Bellevue (WA)
Remote
Full time
Boost your interview chances
Create a job specific, tailored resume for higher success rate.
Job summary
A leading company is seeking a skilled Site Reliability Engineer to design and maintain scalable AWS infrastructure. The ideal candidate will have expertise in Terraform, Kubernetes, and automation scripting. This role involves collaborating with development teams and ensuring high availability of production systems. If you have a strong background in DevOps and cloud technologies, this opportunity is for you!
Qualifications
- 3–6 years of SRE/DevOps experience in a production environment.
Responsibilities
- Design, implement, and maintain scalable and secure AWS infrastructure.
- Build and maintain infrastructure as code using Terraform or CloudFormation.
- Manage Kubernetes (EKS) clusters and containerized workloads.
Skills
AWS
Terraform
Kubernetes
Python
Bash
Linux
Networking
Troubleshooting
Job Description:
Pay Range: $55hr - $60hr
Responsibilities:- Design, implement, and maintain scalable and secure AWS infrastructure.
- Build and maintain infrastructure as code using Terraform or CloudFormation.
- Manage Kubernetes (EKS) clusters and containerized workloads.
- Develop monitoring and alerting solutions using tools like Prometheus, Grafana, and CloudWatch.
- Support CICD pipelines using tools such as Jenkins, GitHub Actions, or CodePipeline.
- Participate in incident response, troubleshooting, and root cause analysis.
- Automate operational tasks through scripting (Python, Bash, etc.).
- Ensure high availability, reliability, and performance of production systems.
- Collaborate with development teams to improve system design and architecture.
Required Skills:
- 3–6 years of SREDevOps experience in a production environment.
- Strong hands-on experience with AWS services (EC2, S3, IAM, VPC, RDS, Lambda, etc.).
- Proficient in Terraform or CloudFormation.
- Experience with Docker and Kubernetes (EKS preferred).
- Familiarity with Linux system administration and shell scripting.
- Strong understanding of monitoring, logging, and alerting frameworks.
- Good knowledge of networking concepts (DNS, TCP/IP, Load Balancing).
- Strong troubleshooting and incident management skills.