Get AI-powered advice on this job and more exclusive features.
This range is provided by Tandym Group. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.
Base pay range
$70.00/hr - $75.00/hr
A Virginia-based services company is an experienced and motivated AWS Cloud Site Reliability Engineer (SRE) to join our dynamic team. As an AWS Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of the company's cloud infrastructure on Amazon Web Services (AWS).
Responsibilities:
- Design, implement, and manage infrastructure as code (IaC) solutions using tools like AWS CloudFormation, Terraform or Helm Charts to automate deployment and scaling processes
- Implement robust monitoring and alerting systems to proactively identify and address potential issues before they impact system performance
- Conduct performance analysis and optimization of AWS infrastructure components to enhance system efficiency and reduce latency
- Participate in on-call rotations to respond to and resolve incidents promptly
- Conduct post-incident reviews to identify root causes and implement preventive measures
- Work closely with the Security teams to implement and enforce best practices for securing AWS environments
- Facilitate clear communication across teams, providing updates on release status, known issues, and any potential impact on stakeholders
- Develop and maintain automated deployment pipelines using industry-standard tools such as AWS Cl/CD, GitLab CI/CD, Jenkins or similar
- Proactively identify areas for process improvement within the release management lifecycle
- Collaborate with QA teams to establish and execute release validation procedures
- Perform other duties, as needed
Qualifications:
- Proven experience as a Site Reliability Engineer or similar role
- Bachelor's degree in Computer Science, Engineering, or related field (or equivalent work experience)
- In-depth knowledge of AWS services and expertise in managing cloud infrastructure
- Proficiency in scripting languages (e.g., Python, Bash) for automation tasks
- Strong understanding of DevOps principles and continuous integration/continuous deployment (CI/CD) pipelines
- Proficiency in CI/CD tools such as AWS CI/CD, GitLab CI/CD, or others
- Familiarity with infrastructure as code (IaC) tools like CloudFormation, Terraform, Helm Charts, Morpheus, or similar technologies
- Hands-on experience with version control systems (AWS CodeCommit, Git, SVN) and branching strategies
- Experience with containerization and orchestration tools (e.g., Amazon Elastic Compute Service (ECS), Amazon Elastic Kubernetes Service (EKS), Docker, Kubernetes)
- Familiarity with monitoring tools (e.g., CloudWatch, Prometheus) and log analysis
- Solid understanding of Agile methodologies and their application in release management
- Excellent problem-solving and troubleshooting skills
- Strong communication and collaboration skills
Desired Skills:
- Relevant certifications in DevOps or related fields
A Virginia-based services company is an experienced and motivated AWS Cloud Site Reliability Engineer (SRE) to join our dynamic team. As an AWS Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of the company's cloud infrastructure on Amazon Web Services (AWS).
Responsibilities:
- Design, implement, and manage infrastructure as code (IaC) solutions using tools like AWS CloudFormation, Terraform or Helm Charts to automate deployment and scaling processes
- Implement robust monitoring and alerting systems to proactively identify and address potential issues before they impact system performance
- Conduct performance analysis and optimization of AWS infrastructure components to enhance system efficiency and reduce latency
- Participate in on-call rotations to respond to and resolve incidents promptly
- Conduct post-incident reviews to identify root causes and implement preventive measures
- Work closely with the Security teams to implement and enforce best practices for securing AWS environments
- Facilitate clear communication across teams, providing updates on release status, known issues, and any potential impact on stakeholders
- Develop and maintain automated deployment pipelines using industry-standard tools such as AWS Cl/CD, GitLab CI/CD, Jenkins or similar
- Proactively identify areas for process improvement within the release management lifecycle
- Collaborate with QA teams to establish and execute release validation procedures
- Perform other duties, as needed
Qualifications:
- Proven experience as a Site Reliability Engineer or similar role
- Bachelor's degree in Computer Science, Engineering, or related field (or equivalent work experience)
- In-depth knowledge of AWS services and expertise in managing cloud infrastructure
- Proficiency in scripting languages (e.g., Python, Bash) for automation tasks
- Strong understanding of DevOps principles and continuous integration/continuous deployment (CI/CD) pipelines
- Proficiency in CI/CD tools such as AWS CI/CD, GitLab CI/CD, or others
- Familiarity with infrastructure as code (IaC) tools like CloudFormation, Terraform, Helm Charts, Morpheus, or similar technologies
- Hands-on experience with version control systems (AWS CodeCommit, Git, SVN) and branching strategies
- Experience with containerization and orchestration tools (e.g., Amazon Elastic Compute Service (ECS), Amazon Elastic Kubernetes Service (EKS), Docker, Kubernetes)
- Familiarity with monitoring tools (e.g., CloudWatch, Prometheus) and log analysis
- Solid understanding of Agile methodologies and their application in release management
- Excellent problem-solving and troubleshooting skills
- Strong communication and collaboration skills
Desired Skills:
- Relevant certifications in DevOps or related fields
Desired Skills and Experience
Govcloud and gitlab and aws and devops and kubernetes and docker and infrastructure and automation and ansible and puppet and ecs
Seniority level
Seniority level
Entry level
Employment type
Job function
Job function
Information TechnologyIndustries
Technology, Information and Internet
Referrals increase your chances of interviewing at Tandym Group by 2x
Sign in to set job alerts for “Site Reliability Engineer” roles.
Site Reliability Engineer L4, Netflix Technology Services
Site Reliability Engineer L5 - Open Connect
United States $100,000.00-$720,000.00 1 week ago
Junior Site Reliability Engineer (Remote)
United States $80,237.00-$139,077.00 19 hours ago
United States $100,000.00-$720,000.00 1 week ago
United States $100,000.00-$720,000.00 1 week ago
Senior Site Reliability Engineer (Remote)
United States $133,109.00-$239,596.00 19 hours ago
United States $64,000.00-$112,000.00 2 weeks ago
United States $147,000.00-$208,000.00 2 weeks ago
United States $170,000.00-$210,000.00 1 week ago
New York, NY $72,100.00-$133,900.00 5 days ago
United States $170,000.00-$720,000.00 2 weeks ago
United States $140,000.00-$140,000.00 6 hours ago
Site Reliability Engineer - 100 % Remote
Site Reliability Engineer (FULLY REMOTE)
Site Reliability Engineer (SRE, Remote US)
Austin, TX $120,000.00-$160,000.00 3 months ago
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.