Enable job alerts via email!

Senior Site Reliability Engineer

Exabeam

United States

Remote

USD 90,000 - 150,000

Full time

2 days ago
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An innovative cybersecurity firm is seeking a Site Reliability Engineer to ensure the reliability and scalability of its products. This role involves maintaining a 24x7 production environment, automating infrastructure management, and collaborating with software engineering teams. Ideal candidates will have a strong background in systems programming and automation, along with a passion for clean, well-documented systems. Join a forward-thinking company dedicated to creating disruptive products that enhance security operations for clients worldwide. If you thrive in a dynamic environment and love solving complex problems, this opportunity is for you!

Qualifications

  • 7+ years of relevant technical experience in systems reliability.
  • Fluent in at least one scripting language used by DevOps professionals.

Responsibilities

  • Maintain 24x7 production environment with high service availability.
  • Automate infrastructure management and maintenance processes.
  • Create and monitor dashboards for key infrastructure metrics.

Skills

Python
Java
Linux
Automation
Problem Solving
Collaboration

Education

BS in Computer Science
Equivalent Experience

Tools

Containers/Namespaces
System Automation Tools

Job description

Exabeam is a leader in intelligence and automation that powers security operations for the world’s smartest companies. As a global cybersecurity innovator, Exabeam provides industry-proven, security-focused, and flexible solutions for faster, more accurate threat detection, investigation, and response (TDIR). Learn more at www.exabeam.com.

You’re someone who enjoys being directly accountable for the reliability of a business-critical, large-scale enterprise system. You’re comfortable guiding and making decisions with limited information and are capable of operating within the trade-offs present when solving for immediate needs versus solving with bigger scale solutions. You might be considered a subject matter expert in systems reliability and you feel rewarded by working to develop operability culture in a quickly growing and changing environment. You’re comfortable owning a wide and diverse set of problem areas and are willing to go out of your lane to affect change. You may have developed one or more metrics, log aggregation, or performance analysis systems in your career.

This is a fantastic opportunity to work and collaborate closely with our software engineering, architecture, and operations teams at Exabeam. Our Site Reliability Engineers are responsible for ensuring Exabeam products and services are highly available, reliable, secure, and scalable. The ideal candidates are fluent in systems programming and/or automation and can leverage their experience to solve complex problems associated with running production environments at massive scale in multi-tenant environments. We’re creating cool, disruptive products …come join us!

What You'll Do
  1. Maintain 24x7 production environment with a high level of service availability. Perform quality reviews, manage operational issues.
  2. Create and monitor dashboards and alerts for key infrastructure metrics, and business KPIs that relate to site reliability. Make monitoring and alerting alert on symptoms and not on outages.
  3. Ensure services are designed with 24/7 availability and operational readiness and rigor.
  4. Develop processes, tools, automation, and software changes to address operational issues.
  5. Automate infrastructure management and maintenance with the aim of empowering the team and ensuring site reliability.
  6. Implement automation and orchestration for manual processes required to operate and deploy cloud services, working closely with advanced technology teams.
  7. Document every action so your findings turn into repeatable actions–and then into automation.
  8. Define non-functional requirements as part of the product lifecycle to influence new designs, standards, and methods for scalable, highly available distributed systems.
  9. Resolve product/service defects or design changes, infrastructure changes, or operational issues.
  10. Identify, evaluate, and execute preventive measures to minimize/avoid impact to the customer experience, proactively rather than reactively.
Who You Are
  • A self-starter comfortable working independently without extensive supervision.
  • A software engineer with curiosity for operations, or an operations engineer eager to collaborate with software engineers to improve response times, scalability, and availability.
  • Obsessive about clean, well-documented, and comprehensible systems and scripts.
  • Prefer automating problems away rather than repeating manual work.
  • Collaborative and eager to empower the engineering team.
  • Fluent in at least one scripting language used by DevOps professionals (Python, Perl, PHP, Ruby) and Java.
  • Passionate about learning new technologies or languages.
  • Experienced with scalable web architectures.
  • Calm under crisis and effective in complex problem resolution.
  • Experienced with Linux, containers/namespaces, and system automation tools for Unix and/or cloud platforms.
  • 7+ years of relevant technical experience.
  • BS in Computer Science, Computer Engineering, Math, or equivalent experience.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Senior Site Reliability Engineer

Censys, Inc.

Ann Arbor

Remote

USD 145,000 - 195,000

4 days ago
Be an early applicant

Sr. Site Reliability Engineer

Dayforce

Remote

USD 80,000 - 120,000

4 days ago
Be an early applicant

Senior Site Reliability Engineer

Bitwarden

Santa Barbara

Remote

USD 120,000 - 185,000

11 days ago

Senior Site Reliability Engineer

Bitwarden Inc.

California

Remote

USD 120,000 - 185,000

12 days ago

Senior Site Reliability Engineer - Azure - Remote

Optum

Eden Prairie

Remote

USD 89,000 - 177,000

7 days ago
Be an early applicant

FlightAware- Sr. Site Reliability Engineer (Remote)

Pratt & Whitney

Remote

USD 101,000 - 203,000

8 days ago

Senior Reliability Engineer

JLL

Chicago

Remote

USD 120,000 - 140,000

4 days ago
Be an early applicant

Senior Site Reliability Engineer - Wikimedia Enterprise

Wikimedia Foundation

Remote

USD 105,000 - 164,000

28 days ago

Senior Site Reliability Engineer

Nami Technology Joint Stock Company

Remote

USD 80,000 - 100,000

-1 days ago
Be an early applicant