Enable job alerts via email!

Site Reliability Engineer

Two Barrels LLC

United States

Remote

USD 148,000 - 175,000

Full time

2 days ago

Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading company is seeking a Site Reliability Engineer to enhance their operational systems and ensure seamless performance. You will work remotely or in select cities to build automation tools, fix issues proactively, and drive a culture of reliability and continuous improvement. This role offers an attractive salary along with comprehensive benefits, great work-life balance, and an opportunity to be a vital part of a dynamic engineering team.

Benefits

Great Wage & Success Meetings

Work From Home comfort package

22 days paid time off annually

Up to 5% 401k employer matching

100% employer-paid medical, dental, and vision

Maternity and Paternity Leave

Flexible hours

Qualifications

5+ years of experience in software engineering.
2+ years in site reliability engineering or DevOps.
Deep experience with infrastructure as code tools.

Responsibilities

Build tools and automation to enhance system reliability.
Respond to incidents and drive improvements.
Ensure systems are steady, secure, and efficiently maintained.

Skills

Cloud platforms

Site reliability engineering

Kubernetes

Docker

Monitoring tools

Database optimization

Performance tuning

Education

Bachelor's degree in Computer Science

Tools

Terraform

CloudFormation

Prometheus

Grafana

Overview:
Two Barrels is looking for a Site Reliability Engineer who can help keep our systems steady, secure, and running like a well-oiled machine (except without actual oil). You'll work closely with our DevOps engineers to build out tools and automation that make things faster, easier, and less painful for everyone.
Your main job? Stop problems before they start. And when something does break (because let's be real-it will), help us fix it quickly and learn from it so we don't do the same dumb thing twice. We're big on taking ownership here. You won't get blamed for something going wrong-but you will be expected to help make it right.
If you like digging into weird errors, thinking ahead, and making things just work-even when no one notices-this might be your kind of thing.
Location:
Remote | Spokane, WA | Salt Lake City, UT | Austin, TX
Duration:
Full Time
Wage:
up to $175,000/ Year
Minimum Qualifications:

Bachelor's degree in Computer Science, Software Engineering, or equivalent practical experience.
5+ years of experience in software engineering.
2+ years of experience in site reliability engineering, DevOps, or infrastructure engineering roles.
Deep experience with cloud platforms (AWS, Azure, or GCP) and infrastructure as code tools such as Terraform, CloudFormation, or Pulumi.
Strong proficiency with Kubernetes, Docker, and container orchestration in production environments.
Hands-on experience with observability and monitoring tools like Prometheus, Grafana, OpenTelemetry, Sentry, or New Relic.
Proven ability to design and implement highly available, fault-tolerant systems and lead proactive incident response efforts.
Experience with performance tuning, database optimization, and caching strategies (e.g., PostgreSQL, Redis, Memcached).
Demonstrated ability to drive reliability improvements, reduce operational toil, and foster a culture of resilience and continuous improvement.

Preferred Qualifications:

Experience leading reliability-focused initiatives such as post-incident reviews, capacity planning, and root cause analysis.
Experience in site reliability engineering within Ruby on Rails environments.
Familiarity with the Grafana observability stack and related tools (e.g., Alloy, Loki, Tempo, Prometheus).
In-depth experience with AWS services, including ECS, EKS, Route 53, and other related tools.
Proven ability to collaborate across teams to improve service reliability, reduce incident frequency, and drive operational excellence.
Troubleshoot and resolve complex production issues, applying SRE best practices to minimize impact and prevent recurrence.
Continuously drive improvements in operational efficiency and system resilience.

Why you might like this job:
You like when things work-and you're the kind of person who quietly fixes things while everyone else is still yelling "It's broken!" You think alerts should be useful, not just annoying background noise, and you enjoy building systems that mostly run themselves (because babysitting servers isn't your idea of fun).
You probably have a bit of a tinkerer's soul. Maybe you've automated your coffee maker or built a Raspberry Pi just to turn your lights purple. You appreciate clean logs, quiet dashboards, and sleep that isn't interrupted by 3AM calls.
You want to work somewhere that's weird in a good way-where you're trusted to do your job, encouraged to ask "why?", and no one makes you sit through a meeting about synergy.
If that all sounds oddly satisfying, this might be the job for you.
Benefits:

Great Wage & Success Meetings with your manager
Work From Home comfort package & company provided equipment
22 days paid time off annually, PLUS 4 paid holidays
Up to 5% 401k employer matching through Fidelity
100% employer-paid medical, dental and vision for employees
Maternity and Paternity Leave
Flexible hours
Coffee shop next door
Crappy parking? Oh, I mean a cool downtown location for easy public transportation options...

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs