Enable job alerts via email!

Site Reliability Engineer

1872 Consulting

Redwood City (CA)

Remote

USD 120,000 - 175,000

Full time

3 days ago
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading company in the tech industry seeks a talented Site Reliability Engineer to ensure system reliability and performance. The ideal candidate will work closely with development teams to automate and enhance our infrastructure, utilizing tools such as AWS and Terraform. This 100% remote position offers a chance to significantly impact the availability of our services while developing skills in a dynamic environment.

Qualifications

  • 4+ years working with Terraform and AWS.
  • 2+ years using Gitlab as CI tool, Datadog for alerting, and Kubernetes.
  • Proficient in Linux and Unix Shell.

Responsibilities

  • Handle on-call rotations and respond to LeadIQ availability incidents.
  • Improve the deployment process and run infrastructure with AWS and Kubernetes.
  • Document actions and automate processes for repeatability.

Skills

Problem-solving
Communication

Tools

Terraform
AWS
Gitlab
Datadog
Kubernetes
Linux
NodeJS
Go

Job description

Site Reliability Engineer - 100% Remote

Role Summary:

Site Reliability Engineers (SREs) are responsible for working with different developer teams to keep our systems running smoothly. They are a blend of pragmatic operators and software craftspeople that apply excellent problem-solving and communication skills to develop or configure tools that will automate, monitor, and alert the reliability of internal Systems


What you will be doing:

  • Be on-call rotation to respond to LeadIQ availability incidents and support developers with customer incidents
  • Use your on-call shift to prevent incidents from happening. Step-in either actively or in support of the engineers when they do.
  • Run our infrastructure with AWS, Terraform, and Kubernetes (EKS).
  • Think about systems - edge cases, failure modes, behaviors, specific implementations.
  • Make monitoring and alert on symptoms and not on outages.
  • Document every action, so your findings turn into repeatable actions–and then into automation.
  • Improve the deployment process to make it as boring as possible.
  • Design, build and maintain core infrastructure pieces that allow LeadIQ scaling to support hundreds of thousands of concurrent users.
  • Debug production issues across services and levels of the stack.
  • Plan the growth of LeadIQ infrastructure.
  • Support the definition and building of SLI and SLO for engineering teams

The Requirements:

  • 4+ years working with Terraform and AWS
  • 2+ years working with-
    • Gitlab (or similar) as CI tool
    • Datadog (or similar) as Alerting tool
    • Kubernetes
  • Know your way around Linux and the Unix Shell.
  • Programming skills on NodeJS and/or Go

Nice to Haves

  • Have experience with tech stack: Nginx, Docker, Kubernetes, Terraform, Terragrunt, AWS, Gitlab, Helm, ArgoCD, Datadog, or similar technologies
  • AWS, Terraform, Kubernetes certifications
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Site Reliability Engineer (Data)

Zapier

San Francisco null

Remote

Remote

USD 120,000 - 160,000

Full time

Yesterday
Be an early applicant

Site Reliability Engineer, Resware

Menlo Ventures

San Francisco null

Remote

Remote

USD 130,000 - 140,000

Full time

5 days ago
Be an early applicant

Remote Senior Site Reliability Engineer (SRE) - Zetachain

Blockchain Works

San Francisco null

Remote

Remote

USD 120,000 - 160,000

Full time

8 days ago

Junior Site Reliability Engineer (Remote)

Lensa

null null

Remote

Remote

USD 80,000 - 140,000

Full time

Today
Be an early applicant

Junior Site Reliability Engineer (Remote)

Lensa

null null

Remote

Remote

USD 80,000 - 140,000

Full time

Yesterday
Be an early applicant

Site Reliability Engineer

Noir

null null

Remote

Remote

USD 120,000 - 180,000

Full time

Yesterday
Be an early applicant

Senior Site Reliability Engineer

Zeektek

null null

Remote

Remote

USD 130,000 - 160,000

Full time

2 days ago
Be an early applicant

Site Reliability Engineer

TieTalent

Los Angeles null

Remote

Remote

USD 120,000 - 180,000

Full time

Today
Be an early applicant

Site Reliability Engineer

Offchain Labs

null null

Remote

Remote

USD 120,000 - 180,000

Full time

Today
Be an early applicant