Enable job alerts via email!

AWS Cloud Site Reliability Engineer (SRE)

Tandym Group

United States

On-site

USD 100,000 - 125,000

Full time

6 days ago

Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A Virginia-based services company seeks an experienced AWS Cloud Site Reliability Engineer (SRE) to enhance the reliability and performance of its cloud infrastructure. This critical role involves designing infrastructure as code, conducting performance analysis, and ensuring strong collaboration across various teams. Candidates should have a strong background in AWS services, scripting, and proven experience in site reliability engineering.

Qualifications

Proven experience as a Site Reliability Engineer or similar role.
In-depth knowledge of AWS services and cloud infrastructure management.
Proficiency in scripting languages (Python, Bash).

Responsibilities

Design and manage infrastructure as code solutions using AWS tools.
Implement monitoring systems to proactively identify potential issues.
Conduct performance analysis and optimize AWS infrastructure components.

Skills

AWS services

Automation

DevOps principles

Containerization

Monitoring tools

Scripting languages

Problem-solving

Collaboration

Education

Bachelor's degree in Computer Science or Engineering

Tools

AWS CloudFormation

Terraform

Jenkins

GitLab CI/CD

Docker

Kubernetes

Get AI-powered advice on this job and more exclusive features.

This range is provided by Tandym Group. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.

Base pay range

$70.00/hr - $75.00/hr

A Virginia-based services company is an experienced and motivated AWS Cloud Site Reliability Engineer (SRE) to join our dynamic team. As an AWS Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of the company's cloud infrastructure on Amazon Web Services (AWS).

Responsibilities:

Design, implement, and manage infrastructure as code (IaC) solutions using tools like AWS CloudFormation, Terraform or Helm Charts to automate deployment and scaling processes
Implement robust monitoring and alerting systems to proactively identify and address potential issues before they impact system performance
Conduct performance analysis and optimization of AWS infrastructure components to enhance system efficiency and reduce latency
Participate in on-call rotations to respond to and resolve incidents promptly
Conduct post-incident reviews to identify root causes and implement preventive measures
Work closely with the Security teams to implement and enforce best practices for securing AWS environments
Facilitate clear communication across teams, providing updates on release status, known issues, and any potential impact on stakeholders
Develop and maintain automated deployment pipelines using industry-standard tools such as AWS Cl/CD, GitLab CI/CD, Jenkins or similar
Proactively identify areas for process improvement within the release management lifecycle
Collaborate with QA teams to establish and execute release validation procedures
Perform other duties, as needed

Qualifications:

Proven experience as a Site Reliability Engineer or similar role
Bachelor's degree in Computer Science, Engineering, or related field (or equivalent work experience)
In-depth knowledge of AWS services and expertise in managing cloud infrastructure
Proficiency in scripting languages (e.g., Python, Bash) for automation tasks
Strong understanding of DevOps principles and continuous integration/continuous deployment (CI/CD) pipelines
Proficiency in CI/CD tools such as AWS CI/CD, GitLab CI/CD, or others
Familiarity with infrastructure as code (IaC) tools like CloudFormation, Terraform, Helm Charts, Morpheus, or similar technologies
Hands-on experience with version control systems (AWS CodeCommit, Git, SVN) and branching strategies
Experience with containerization and orchestration tools (e.g., Amazon Elastic Compute Service (ECS), Amazon Elastic Kubernetes Service (EKS), Docker, Kubernetes)
Familiarity with monitoring tools (e.g., CloudWatch, Prometheus) and log analysis
Solid understanding of Agile methodologies and their application in release management
Excellent problem-solving and troubleshooting skills
Strong communication and collaboration skills

Desired Skills:

Relevant certifications in DevOps or related fields

Responsibilities:

Design, implement, and manage infrastructure as code (IaC) solutions using tools like AWS CloudFormation, Terraform or Helm Charts to automate deployment and scaling processes
Implement robust monitoring and alerting systems to proactively identify and address potential issues before they impact system performance
Conduct performance analysis and optimization of AWS infrastructure components to enhance system efficiency and reduce latency
Participate in on-call rotations to respond to and resolve incidents promptly
Conduct post-incident reviews to identify root causes and implement preventive measures
Work closely with the Security teams to implement and enforce best practices for securing AWS environments
Facilitate clear communication across teams, providing updates on release status, known issues, and any potential impact on stakeholders
Develop and maintain automated deployment pipelines using industry-standard tools such as AWS Cl/CD, GitLab CI/CD, Jenkins or similar
Proactively identify areas for process improvement within the release management lifecycle
Collaborate with QA teams to establish and execute release validation procedures
Perform other duties, as needed

Qualifications:

Proven experience as a Site Reliability Engineer or similar role
Bachelor's degree in Computer Science, Engineering, or related field (or equivalent work experience)
In-depth knowledge of AWS services and expertise in managing cloud infrastructure
Proficiency in scripting languages (e.g., Python, Bash) for automation tasks
Strong understanding of DevOps principles and continuous integration/continuous deployment (CI/CD) pipelines
Proficiency in CI/CD tools such as AWS CI/CD, GitLab CI/CD, or others
Familiarity with infrastructure as code (IaC) tools like CloudFormation, Terraform, Helm Charts, Morpheus, or similar technologies
Hands-on experience with version control systems (AWS CodeCommit, Git, SVN) and branching strategies
Experience with containerization and orchestration tools (e.g., Amazon Elastic Compute Service (ECS), Amazon Elastic Kubernetes Service (EKS), Docker, Kubernetes)
Familiarity with monitoring tools (e.g., CloudWatch, Prometheus) and log analysis
Solid understanding of Agile methodologies and their application in release management
Excellent problem-solving and troubleshooting skills
Strong communication and collaboration skills

Desired Skills:

Relevant certifications in DevOps or related fields

Desired Skills and Experience

Govcloud and gitlab and aws and devops and kubernetes and docker and infrastructure and automation and ansible and puppet and ecs

Seniority level

Seniority level
Entry level

Employment type

Employment type
Contract

Job function

Job function
Information Technology
Industries
Technology, Information and Internet

Referrals increase your chances of interviewing at Tandym Group by 2x

Site Reliability Engineer L4, Netflix Technology Services

Site Reliability Engineer L5 - Open Connect

United States $100,000.00-$720,000.00 1 week ago

Junior Site Reliability Engineer (Remote)

United States $80,237.00-$139,077.00 19 hours ago

United States $100,000.00-$720,000.00 1 week ago

Senior Site Reliability Engineer (Remote)

United States $133,109.00-$239,596.00 19 hours ago

United States $64,000.00-$112,000.00 2 weeks ago

United States $147,000.00-$208,000.00 2 weeks ago

United States $170,000.00-$210,000.00 1 week ago

New York, NY $72,100.00-$133,900.00 5 days ago

United States $170,000.00-$720,000.00 2 weeks ago

United States $140,000.00-$140,000.00 6 hours ago

Site Reliability Engineer - 100 % Remote

Site Reliability Engineer (FULLY REMOTE)

Site Reliability Engineer (SRE, Remote US)

Austin, TX $120,000.00-$160,000.00 3 months ago

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Site Reliability Engineer - 100 % Remote

The Dignify Solutions, LLC

AECOM

Montana

Remote

USD 110’000 - 130’000

5 days ago

Be an early applicant

AWS Cloud Site Reliability Engineer (SRE)

Tandym Group

United States

On-site

USD 100,000 - 125,000

Full time

Job summary

Qualifications

Responsibilities

Skills

Education

Tools

Job description

Similar jobs

Site Reliability Engineer - 100 % Remote

New Jersey

Remote

USD 100’000 - 175’000

Site Reliability Engineer

Remote

USD 100’000 - 300’000

Site Reliability Engineer

Remote

USD 100’000 - 150’000

Site Reliability Engineer

Remote

USD 120’000 - 160’000

Senior Site Reliability Engineer ( Remote - US)

Remote

USD 120’000 - 160’000

Site Reliability Engineer

Remote

USD 100’000 - 150’000

Site Reliability Engineer

Remote

USD 90’000 - 110’000

Site Reliability Engineer (Remote - Canada)

Remote

USD 64’000 - 720’000

Site Reliability Engineer-FedRAMP (FULLY REMOTE)

Montana

Remote

USD 110’000 - 130’000