Enable job alerts via email!

Site Reliability Engineer - Observability

Second Front Systems

United States

Remote

USD 160,000 - 180,000

Full time

2 days ago

Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

Second Front Systems seeks a Senior Site Reliability Engineer to join their Observability team, focusing on deploying and maintaining monitoring infrastructure across DoD networks. Ideal candidates are experienced in Kubernetes and have a strong DevSecOps background, contributing to national security efforts while enjoying flexible work options and competitive salaries.

Benefits

100% Healthcare coverage

401(k) with 3% company contribution

Wellness perks

Annual professional development stipend

Flexible paid time off

Parental leave

Referral Bonus

Qualifications

5+ years of Site Reliability Engineering or DevOps experience.
Deep experience with Kubernetes administration, troubleshooting, and scaling.
Ability to work independently in a remote environment.

Responsibilities

Deploy and maintain observability stack across multiple customer clusters.
Build Helm chart abstractions and automation for monitoring deployments.
Collaborate with security teams to ensure compliance with NIST requirements.

Skills

Kubernetes administration

Observability tools

Debugging distributed systems

Collaborative skills

Tools

Grafana

Prometheus

Helm

Terraform

ABOUT THE ROLE

Second Front Systems' (2F) Product team is seeking a highly skilled and motivated Senior Site Reliability Engineer to join our Observability team. We are a small team working to accelerate the deployment of emerging technology into national security use-cases. We are seeking technical professionals who want to operate on the front lines of an exciting and disruptive mission.

As a Senior SRE for Second Front Systems, you'll be responsible for deploying, maintaining, and scaling our observability infrastructure across multiple DoD networks. You'll work with Kubernetes-based platforms, BigBang charts from DoD Platform One, and build automation to make our monitoring stack easier to deploy for new customers. You'll be empowered to collaborate with others to implement infrastructure that delivers unique capabilities for our commercial and government customers, including the Department of Defense.

The Observability team is looking for a strong SRE with deep DevSecOps and Kubernetes experience. Someone who has deployed and maintained monitoring infrastructure at scale, with an eye for security in highly-regulated environments. Experience with DoD software deployments, Platform One, and single-tenant architectures is highly valued.

We are a fast-growing entrepreneurial team working at the convergence of technology and national security. If this type of effort interests you, come join us!

Note: This position requires U.S. citizenship due to government contract requirements.

What You’ll Do

Deploy and maintain observability stack (Grafana, Mimir, Prometheus) across multiple customer clusters and DoD networks
Build Helm chart abstractions and automation to streamline monitoring deployments for new customers
Troubleshoot and debug complex Kubernetes issues, networking problems, and monitoring stack failures
Configure and maintain BigBang charts and DoD Platform One integrations
Design and implement infrastructure automation using tools like Pulumi, ArgoCD, and Flux
Work with Istio service mesh and Keycloak for authentication in secure environments
Monitor and optimize performance of monitoring infrastructure across multiple environments
Collaborate with security teams to ensure compliance with NIST requirements and DoD standards
Participate in on-call rotation and incident response for production environments

Skills You’ll Bring to Our Team

5+ years of Site Reliability Engineering or DevOps experience
Deep experience with Kubernetes administration, troubleshooting, and scaling
Hands-on experience deploying and maintaining observability tools (Prometheus, Grafana, Mimir/Cortex)
Strong understanding of Helm charts, GitOps practices, and CNCF tooling
Experience with service mesh technologies (Istio preferred)
Proven ability to debug complex distributed systems and networking issues
Understanding of authentication systems and security in regulated environments
Ability to work independently and collaborate with team members in a remote environment

Preferred Qualifications

Active security clearance or ability to obtain a Secret-level security clearance
Previous experience with DoD software deployments and Platform One
Experience with BigBang charts and Iron Bank containers
Experience working in national security or highly regulated environments
Familiarity with compliance frameworks (NIST, FedRAMP, etc.)
Experience with infrastructure as code (Pulumi, Terraform)

Technologies we Use

Observability: Grafana stack, Prometheus, custom alerting tools
Kubernetes: Helm, ArgoCD, Flux, Tekton, BigBang charts
Security: Istio, Keycloak, Kyverno
Infrastructure: AWS/GCP/Azure, Pulumi, Git/GitLab
Languages: YAML, Bash, Go

$160,000 - $180,000 a year Perks & Benefits
This role is full time. As a public benefit corporation, we’re a team of purpose-driven trailblazers transforming the future of U.S. national security. We hire the best to do their best and, as such, we are committed to providing the perks and benefits you need to be successful—both in- and outside the workplace.
We offer you:
Competitive Salary 100% Healthcare, vision and dental coverage 401(k) + 3% company contribution Wellness perks (Fitness classes, mental health resources) Equity incentive plan Tech + office supplies stipend Annual professional development stipend Flexible paid time off + federal holidays off Parental leave Work from anywhere Referral Bonus
Visit our careers page to learn more. #LI-Remote

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Senior Site Reliability Engineer

MongoDB

Remote

USD 127,000 - 249,000

4 days ago

Be an early applicant

Senior Site Reliability Engineer New United States - Remote

Motive

Remote

USD 126,000 - 193,000

6 days ago

Be an early applicant

Senior Platform Engineer (Salesforce)

Optomi

Remote

USD 170,000 - 720,000

3 days ago

Be an early applicant

Staff Site Reliability Engineer (Staff SRE) (Remote)

SailPoint

Remote

USD 129,000 - 240,000

19 days ago

Senior Software Engineer - Platform & Resiliency New

Truffle Security Co.

Remote

USD 159,000 - 188,000

6 days ago

Be an early applicant

Senior Site Reliability Engineer

Credit Acceptance

Remote

USD 117,000 - 174,000

17 days ago

Site Reliability Engineer - Remote

PayNearMe

Santa Clara

Remote

USD 175,000 - 195,000

4 days ago

Be an early applicant

Site Reliability Engineer - Remote

ZipRecruiter

Santa Clara

Remote

USD 175,000 - 195,000

3 days ago

Be an early applicant

Site Reliability Engineer Remote

PayNearMe

Santa Clara

Remote

USD 175,000 - 195,000

6 days ago

Be an early applicant

Site Reliability Engineer - Observability

Second Front Systems

United States

Remote

USD 160,000 - 180,000

Full time

Job summary

Benefits

Qualifications

Responsibilities

Skills

Tools

Job description

Similar jobs

Senior Site Reliability Engineer

Remote

USD 127,000 - 249,000

Senior Site Reliability Engineer New United States - Remote

Remote

USD 126,000 - 193,000

Senior Platform Engineer (Salesforce)

Remote

USD 170,000 - 720,000

Staff Site Reliability Engineer (Staff SRE) (Remote)

Remote

USD 129,000 - 240,000

Senior Software Engineer - Platform & Resiliency New

Remote

USD 159,000 - 188,000

Senior Site Reliability Engineer

Remote

USD 117,000 - 174,000

Site Reliability Engineer - Remote

Santa Clara

Remote

USD 175,000 - 195,000

Site Reliability Engineer - Remote

Santa Clara

Remote

USD 175,000 - 195,000

Site Reliability Engineer Remote

Santa Clara

Remote

USD 175,000 - 195,000