Enable job alerts via email!

Senior Site Reliability Engineer

Abnormal Security

United States

Remote

USD 170,000 - 200,000

Full time

Today

Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

Abnormal Security is seeking a Senior Site Reliability Engineer to enhance the reliability and scalability of their systems. This role involves leading initiatives for operational excellence, mentoring engineers, and collaborating with product teams to improve service ownership. Ideal candidates will have extensive experience in SRE, strong programming skills, and the ability to influence system design.

Qualifications

8+ years of experience in infrastructure, DevOps, or SRE roles.
Deep knowledge of production-grade distributed systems and cloud-native architectures.
Strong programming skills in Python, Go, or similar languages.

Responsibilities

Own the operational maturity of services in the SRE software stack.
Lead incident reviews and root cause analyses.
Mentor other engineers and drive adoption of SRE principles.

Skills

Distributed systems

Operational excellence

Programming in Python

Programming in Go

Kubernetes

Terraform

Observability tools

About the Role

Abnormal Security is looking for a Senior Site Reliability Engineer (SRE) to join our Infrastructure team. In this role, you will be responsible for the reliability, scalability, and operational excellence of our systems and services. You will lead initiatives to improve the operational maturity of both SRE-managed services and critical product systems, driving change across the organization in support of stable operations.

As a senior member of the team, you will independently define and execute quarterly goals, create forward-looking roadmaps, and own cross-functional projects aligned with company-level objectives. You will serve as a key advocate for reliability, providing technical leadership, deep analysis, and mentorship while embedding with product teams as needed to improve service ownership and incident response practices.

The ideal candidate:

Has strong technical depth in distributed systems and operational excellence
Possesses a product-focused mindset with the ability to translate business needs into reliability goals
Is a strong communicator and mentor, able to influence both within the SRE team and across engineering
Has demonstrated experience leading broad technical initiatives across teams and systems

What You Will Do

Own the operational maturity of services in the SRE software stack, driving architectural and tooling improvements
Proactively partner with product teams to embed SRE best practices and support services with operational challenges
Independently define and drive quarterly goals for the SRE team with measurable impact on system reliability and developer productivity
Design and maintain systems that promote observability, automated recovery, scalability, and resilience
Lead incident reviews and root cause analyses; ensure follow-up actions are implemented and shared across teams
Collaborate with engineering leadership to shape the team roadmap and contribute to company-wide reliability goals
Mentor other engineers and drive adoption of SRE principles throughout the engineering organization

Must Have

8+ years of experience in infrastructure, DevOps, or Site Reliability Engineering roles
Deep knowledge of production-grade distributed systems and cloud-native architectures
Demonstrated experience managing service availability, latency, and incident response in production environments
Strong programming skills in Python, Go, or similar languages
Experience with Kubernetes, Terraform, and observability tools (e.g., Prometheus, Grafana, Datadog)
Proven ability to lead complex, multi-team initiatives and influence system design for reliability

Nice To Have

Prior experience embedding with product engineering teams to support operational goals
Familiarity with AWS and multi-cloud environments (e.g., Azure, GCP)
Experience in regulated environments or with FedRAMP-compliant systems
Contributions to open-source SRE tooling or community knowledge sharing

#LI-NT1

At Abnormal AI, certain roles are eligible for a bonus, restricted stock units (RSUs), and benefits. Individual compensation packages are based on factors unique to each candidate, including their skills, experience, qualifications and other job-related reasons. We know that benefits are also an important piece of your total compensation package. Learn more about our Compensation and Equity Philosophy on our Benefits & Perks page.

Base salary range: $170,000 — $200,000 USD

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs