We have a 12 month contract to hire position for a SRE with a strong background in AWS, Cloudwatch, Terraform, Cloudtrail, Gitlab for CI/CD.
The positions are 100% remote and candidates can sit anywhere in the US.
The main focus for the position they are hiring for will be implementing IaC, building CI/CD pipelines, and observability as code.
Overview of work:
Provide Site Reliability Engineering (SRE) services to support the secure migration and development of the Provider Chasey and API Chasey platforms into AWS. This initiative aims to ensure high availability, performance, and security of critical infrastructure while implementing Infrastructure as Code (IaC), CI/CD pipelines, and observability as code. The scope includes:
- Design and implementation of observability and monitoring frameworks for Web Portals and APIs
- Migration of Provider and API Chasey platforms to AWS
- Implementation of secure infrastructure using Terraform or equivalent IaC tools
- Establishment of CI/CD pipelines for automated deployment and testing
- Security hardening of Web Portals and API endpoints
- Definition and enforcement of service-level objectives (SLOs) and key performance indicators (KPIs)
- Development of runbooks, automation scripts, and incident response procedures
Job Title: SRE Engineer
Key Responsibilities:
- Develop and maintain modular Infrastructure as Code (IaC) using Terraform and CloudFormation.
- Validate and ensure secure and scalable cloud platform expertise across AWS
- Automate and streamline application onboarding processes.
- Design and implement VPCs, IAM policies, ingress/egress rules, TLS encryption, and service mesh for networking and security.
- Manage Kubernetes and container orchestration using Helm, Kubectl, and GitOps.
- Deploy services using blue/green and canary deployment strategies.
- Utilize Git for version control, including branching, merging, and resolving conflicts.
- Create and manage CI/CD pipelines using GitLab CI/CD, including jobs and stages.
- Automate tasks using scripting languages like Bash and Python.
- Implement containerization using Docker and Kubernetes.
- Monitor and log using tools like Splunk, Dynatrace and CloudWatch.
- Adhere to security best practices.
- Set up and manage GitLab Runners for executing CI/CD jobs.
- Use GitLab's API for automation and integration with other tools.
- Utilize monitoring and observability tools such as Dynatrace, Splunk, AWS CloudWatch, Uptime.com, and ObservePoint.
- Manage logs efficiently using tools like Splunk and Dynatrace SaaS.
- Apply open telemetry fundamentals.
- Differentiate signal vs. noise in platform/product cardinality.
- Visualize service interactions and latency issues across microservices using distributed tracing tools like Dynatrace.
- Understand and build SLI/SLOs.
- Implement automation and IaC using Ansible and Terraform.
- Utilize AutoOps/ChatOps.
- Develop and maintain runbooks.
Required Qualifications:
- Proficiency in Git, including branching, merging, and resolving conflicts.
- Experience with GitLab CI/CD, including creating and managing pipelines, jobs, and stages.
- Knowledge of scripting languages like Bash and Python for automating tasks.
- Understanding of Docker and Kubernetes for container orchestration and management.
- Proficiency in using monitoring tools like Splunk and Dynatrace.
- Knowledge of security best practices.
- Experience with log management tools such as Splunk and Dynatrace SaaS.
- Understanding of open telemetry fundamentals.
- Familiarity with distributed tracing tools like Dynatrace.
- Ability to build and understand SLI/SLOs.
- Experience with automation and IaC using Ansible and Terraform.
- Familiarity with AutoOps/ChatOps.
- Ability to develop and maintain runbooks.
Seniority level
Seniority level
Mid-Senior level
Employment type
Job function
Job function
OtherIndustries
IT Services and IT Consulting
Referrals increase your chances of interviewing at Zeektek by 2x
Get notified about new Site Reliability Engineer jobs in United States.
El Paso, TX $130,000.00-$160,000.00 2 weeks ago
Senior Site Reliability Engineer - remote
Software Engineer - AI/ML, Multiple Locations
Redmond, WA $81,900.00-$174,600.00 2 weeks ago
Site Reliability Engineer (FULLY REMOTE)
Redding, CA $130,000.00-$160,000.00 2 weeks ago
Palo Alto, CA $155,000.00-$250,000.00 2 weeks ago
Site Reliability Engineer (SRE, Remote US)
Ann Arbor, MI $120,000.00-$160,000.00 3 months ago
Denver, CO $130,000.00-$160,000.00 2 weeks ago
United States $100,000.00-$720,000.00 2 weeks ago
United States $85,150.00-$153,925.00 5 hours ago
Site Reliability Engineer (SRE, Remote US)
Denver, CO $120,000.00-$160,000.00 3 months ago
Site Reliability Engineer (SRE, Remote US)
San Francisco, CA $120,000.00-$160,000.00 3 months ago
Site Reliability Engineer (FULLY REMOTE)
North Carolina, United States 2 weeks ago
United States $170,000.00-$720,000.00 1 week ago
United States $170,000.00-$210,000.00 3 weeks ago
Site Reliability Engineer (SRE, Remote US)
Boston, MA $120,000.00-$160,000.00 3 months ago
United States $110,000.00-$137,000.00 1 week ago
Northbrook, IL $80,000.00-$100,000.00 1 week ago
DevOps Software Engineer (Remote - United States)
Senior Site Reliability Engineer (Remote)
United States $133,109.00-$239,596.00 1 week ago
DevOps Software Engineer (Remote - United States)
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.