Enable job alerts via email!

Site Reliability Engineer

Vallum Associates

United States

Remote

USD 140,000

Full time

4 days ago

Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

Join an innovative InsureTech company as a Senior Site Reliability Engineer, where you'll design and maintain secure and scalable systems in a cloud-native environment. You'll influence core infrastructure decisions, mentor junior engineers, and drive process improvements. This is an exciting opportunity with a competitive salary in a flexible remote work culture.

Benefits

401(k)

Equity/Stock Options

Qualifications

5-10+ years in SRE, DevOps, or Infrastructure Engineering roles.
Experience with GCP, GKE, and modern cloud-native technologies.
Strong scripting skills in Python and Bash.

Responsibilities

Architect and manage resilient infrastructure on GCP with Kubernetes.
Automate operational tasks to improve efficiency.
Lead incident response and implement preventive strategies.

Skills

Infrastructure Management

Cloud Technologies

Automation

Monitoring and Observability

Incident Response

Security Practices

Tools

Google Cloud Platform (GCP)

Kubernetes (K8s/GKE)

Grafana

Prometheus

ELK Stack

Python

Bash

Get AI-powered advice on this job and more exclusive features.

This range is provided by Vallum Associates. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.

Base pay range

$140,000.00/yr - $140,000.00/yr

Remote - USA

About the Company:

My client is an innovative and fast-growing InsureTech company at the intersection of cybersecurity and insurance. They deliver a unique, proprietary software platform that combines cyber risk mitigation tools, managed SOC support, and cyber insurance into a single, powerful offering for small and midsize businesses. With a mission to provide accessible and affordable digital resilience, they are reshaping how cyber risk is managed for underserved markets.

The Role:

As a Senior Site Reliability Engineer, you will play a critical role in designing, building, and maintaining secure, scalable, and high-performance infrastructure within a cloud-native ecosystem. You will lead initiatives that impact system reliability, operational efficiency, and security posture, while mentoring others and collaborating across engineering teams. This is an exciting opportunity for someone who thrives in a fast-paced, highly technical environment with a strong ownership culture.

Key Responsibilities:

Infrastructure Management:
Architect and manage resilient, scalable infrastructure on Google Cloud Platform (GCP), with a focus on Kubernetes (K8s/GKE), Istio service mesh, YAML, Kustomization, and cert-manager.
Deployment Execution:
Partner with software engineers to plan and deploy software releases. You’ll play a vital role in ensuring releases are efficient, stable, and secure.
Automation & Tooling:
Automate infrastructure and operational tasks using Python, Bash, and related scripting tools to minimize manual overhead and maximize reliability.
Monitoring & Observability:
Leverage tools like Grafana, Prometheus, Loki, and Elastic (ELK Stack) to build proactive monitoring, alerting, and observability solutions.
Incident Response & Reliability Engineering:
Lead root cause analyses and implement preventive strategies to avoid repeat incidents. Champion reliability and scalability across the platform.
Security & Compliance:
Apply best-in-class security practices throughout infrastructure layers. Manage certificate lifecycles and security policies across environments.
Messaging & Integration:
Work with RabbitMQ and other asynchronous messaging systems to support event-driven architectures and distributed service integration.
Guide junior engineers, contribute to documentation, and collaborate cross-functionally to enhance team productivity and reliability awareness.
Process & Continuous Improvement:
Drive continuous improvement through clear documentation, service mapping, SOPs, and capacity planning. Engage in agile ceremonies to contribute to planning and delivery cycles.

What You Bring:

5–10+ years in SRE, DevOps, or Infrastructure Engineering roles.
Deep experience with GCP, GKE, and modern cloud-native technologies.
Proficiency in scripting languages such as Python, Bash, or similar.
Solid understanding of networking, security, system admin principles.
Experience with monitoring/logging stacks including ELK, Grafana, Loki, Prometheus, etc.
Exposure to CI/CD pipelines and tools like FluxCD, ArgoCD is a plus.
Proven ability to own and scale key infrastructure components independently.
Strong interpersonal and documentation skills.

Why Apply?

Be part of a fast-moving startup delivering real impact to underserved businesses.
Join a mission-driven team tackling real-world cybersecurity challenges.
Influence core engineering decisions and infrastructure architecture.
Enjoy a flexible remote work culture and dynamic team environment.

Compensation:

Up to $140,000 USD base salary (DOE)

401(k)
Equity/Stock Options

My client is actively hiring for this role so if you have the relevant experience and tech stack please apply now for an immediate response!

Seniority level

Seniority level
Mid-Senior level

Employment type

Employment type
Full-time

Job function

Job function
Information Technology
Industries
IT Services and IT Consulting

Referrals increase your chances of interviewing at Vallum Associates by 2x

Get notified about new Site Reliability Engineer jobs in United States.

Site Reliability Engineer L5 - Open Connect

United States $100,000.00-$720,000.00 1 week ago

Junior Site Reliability Engineer (Remote)

United States $80,237.00-$139,077.00 2 days ago

United States $100,000.00-$720,000.00 1 week ago

Junior Site Reliability Engineer (Remote)

Site Reliability Engineer L4, Netflix Technology Services

Boise, ID $100,000.00-$140,000.00 2 weeks ago

Senior Site Reliability Engineer (Remote)

United States $133,109.00-$239,596.00 2 days ago

United States $64,000.00-$112,000.00 7 hours ago

United States $147,000.00-$208,000.00 2 weeks ago

United States $170,000.00-$210,000.00 1 week ago

Newton, MA $119,000.00-$165,000.00 2 hours ago

United States $170,000.00-$720,000.00 2 weeks ago

Site Reliability Engineer - 100 % Remote

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Be an early applicant