Enable job alerts via email!

Senior Site Reliability Engineer ( Remote - US)

Jobgether

United States

Remote

USD 120,000 - 160,000

Full time

3 days ago
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading technology platform seeks a Senior Site Reliability Engineer to oversee cloud infrastructure improvements and ensure system scalability. This role involves proactive solutions, Kubernetes management, AWS architecture, and incident response. The ideal candidate will have significant experience in SRE or DevOps, showcasing problem-solving capabilities and strong communication skills within a collaborative environment.

Benefits

Comprehensive health, dental, and vision coverage
Flexible Time Off (unlimited)
Paid family and medical leave
Retirement saving plans
Home office setup allowance
Annual professional development stipend

Qualifications

  • Minimum of 5 years of experience in SRE, DevOps, or Infrastructure Engineering.
  • Expertise in PostgreSQL administration and AWS services.
  • Strong communication skills and ability to document processes.

Responsibilities

  • Own initiatives related to system reliability and scalability.
  • Design, deploy, and manage Kubernetes clusters.
  • Conduct post-incident reviews and improve long-term system practices.

Skills

Kubernetes
AWS
Python
Bash
Observability

Tools

Terraform
Crossplane
GitHub Actions
ArgoCD

Job description

About Jobgether

Jobgether is a Talent Matching Platform that partners with companies worldwide to efficiently connect top talent with the right opportunities through AI-driven job matching.

One of our companies is currently looking for a Senior Site Reliability Engineer in United States.

As a Senior Site Reliability Engineer (SRE), you will play a key role in scaling, securing, and improving the cloud infrastructure of the organization. Your primary focus will be to ensure the reliability and scalability of systems by implementing proactive solutions and automating infrastructure management. You’ll work closely with engineering and platform teams to enhance the reliability of services, manage Kubernetes clusters, and optimize cloud resources. You will also be responsible for leading incident response, conducting post-incident reviews, and refining best practices to continuously improve the system's performance and security.

Accountabilities:

  • Own initiatives related to system reliability and scalability, identifying potential issues and implementing proactive solutions to prevent them.
  • Participate in on-call rotations, responding to incidents, performing root cause analysis, and driving long-term fixes.
  • Design, deploy, and manage Kubernetes clusters, utilizing tools like Helm charts, Cilium, and Karpenter to optimize both performance and cost.
  • Architect and maintain AWS infrastructure, focusing on RDS/Aurora PostgreSQL, networking, and scaling best practices.
  • Automate infrastructure provisioning using tools like Crossplane and Terraform to maintain consistency and scalability.
  • Enhance observability by improving monitoring systems using Datadog and drive proactive detection and resolution of system issues.
  • Conduct post-incident reviews and document lessons learned, driving improvements into long-term system practices.
  • Minimum of 5 years of experience in SRE, DevOps, or Infrastructure Engineering, demonstrating strong ownership and problem-solving skills.
  • Proficiency in Kubernetes, Helm, and networking security practices.
  • In-depth experience with AWS services such as RDS, Aurora, VPC, EKS, EC2, and IAM.
  • Expertise in PostgreSQL administration, including performance tuning and high availability management within AWS.
  • Familiarity with CI/CD tools like GitHub Actions and ArgoCD, with a focus on automation and security best practices.
  • Strong understanding and experience in Infrastructure as Code (IaC) using Crossplane and Terraform.
  • Experience in observability and monitoring with Datadog.
  • Proficiency in Python and Bash scripting for system automation and management.
  • Strong communication skills and the ability to collaborate effectively across engineering teams and document processes in Confluence.
  • Competitive base salary and equity options.
  • Comprehensive health, dental, and vision coverage for you and your family.
  • Life insurance and mental wellness coverage.
  • Flex Time Off (unlimited) in addition to company-paid holidays.
  • Paid family leave, medical leave, and bereavement leave policies.
  • Retirement saving plans to help you plan for the future.
  • Home office setup allowance to customize your work environment.
  • Annual professional development stipend to support your growth.
  • Flexible remote work options with global team collaboration.

Jobgether hiring process disclaimer


This job is posted on behalf of one of our partner companies. If you choose to apply, your application will go through our AI-powered 3-step screening process, where we automatically select the 5 best candidates.


Our AI thoroughly analyzes every line of your CV and LinkedIn profile to assess your fit for the role, evaluating each experience in detail. When needed, our team may also conduct a manual review to ensure only the most relevant candidates are considered.


Our process is fair, unbiased, and based solely on qualifications and relevance to the job. Only the best-matching candidates will be selected for the next round.


If you are among the top 5 candidates, you will be notified within 7 days.
If you do not receive feedback after 7 days, it means you were not selected. However, if you wish, we may consider your profile for other similar opportunities that better match your experience.


Thank you for your interest!

#LI-CL1

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Site Reliability Engineer

Seer

Remote

USD 100,000 - 300,000

5 days ago
Be an early applicant

Senior Site Reliability Engineer

General Dynamics Mission Systems

Aurora

Remote

USD 129,000 - 141,000

5 days ago
Be an early applicant

Senior Site Reliability Engineer

Talent Groups

McKinney

Hybrid

USD 120,000 - 160,000

3 days ago
Be an early applicant

Senior Site Reliability Engineer ( Remote - US)

Jobgether

Remote

USD 120,000 - 160,000

21 days ago

Site Reliability Engineer (Remote - Canada)

Lensa

Remote

USD 64,000 - 720,000

18 days ago

Senior Software Engineer - Platform & Resiliency New

Truffle Security Co.

Remote

USD 159,000 - 188,000

8 days ago

Site Reliability Engineer

Charter Global

Remote

USD 100,000 - 150,000

28 days ago

Site Reliability Engineer - Core C++ Team

ClickHouse

Remote

USD 130,000 - 210,000

4 days ago
Be an early applicant

AWS Cloud Site Reliability Engineer (SRE)

Tandym Group

On-site

USD 100,000 - 125,000

5 days ago
Be an early applicant