Enable job alerts via email!

Senior Site Reliability Engineer

Hyperdrive Recruiting

Raleigh (NC)

Remote

USD 130,000 - 170,000

Full time

4 days ago

Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading financial technology company seeks a Senior Site Reliability Engineer to enhance system reliability for their global payments platform. This role involves ensuring operational efficiency, leading incident responses, and working with development teams to implement best SRE practices. The position offers competitive compensation, extensive benefits, and 100% remote work flexibility.

Benefits

100% remote flexibility

Competitive salary

Employer-paid healthcare for family

Company-paid life and disability insurance

Open Paid Time Off policy

Matching 401(k) plan

$1,000 annual professional development stipend

Access to company-paid professional coaching

Qualifications

Hands-on experience with Datadog, OpenTelemetry, or similar platforms required.
Proficiency in Ruby or Elixir; Python accepted.
Experience with AWS services and relational databases is essential.

Responsibilities

Ensure reliability and performance of the payments platform through monitoring and automation.
Collaborate with development teams for improved application reliability.
Lead incident response efforts and mentorship for team members.

Skills

Observability

Monitoring

Automation

Problem-solving

Tools

Datadog

OpenTelemetry

Ruby

Elixir

AWS

PostgreSQL

Kafka

We are looking for a talented Senior Site Reliability Engineer (SRE) to ensure the reliability, observability, and scalability of a globally distributed payments platform.

We are afinancial technology company that provides an open payments platform, enabling and optimizing digital transactions with a comprehensive payment services marketplace.

Job Duties

Ensure the reliability, availability, and performance of a globally distributed payments platform, processing billions monthly, through monitoring, automation, and continuous improvement.
Collaborate with development teams to improve the reliability and performance of applications.
Implement and maintain robust observability solutions, enabling proactive identification, alerting, and resolution of issues.
Lead incident response efforts, including participation in a shared on-call rotation to maintain 24/7 system reliability, root cause analysis, and implementing preventative measures.
Develop and maintain automation tools to reduce manual intervention, streamline operations, and enhance developer productivity.
Monitor, analyze, and optimize the performance of relational databases, identifying and resolving bottlenecks.
Lead by example, infusing modern SRE best practices and fostering a culture of reliability and performance.
Provide technical guidance and mentorship to team members.

Ideal Background

Hands-on experience with Datadog, OpenTelemetry, Sentry, Sumo Logic or similar monitoring and observability platforms.
Proficiency in a modern programming language, with a proven ability to write clean, maintainable, and efficient code; Ruby, Rails, and Elixir experience are preferred (Python is also accepted)
Experience with AWS services, including EC2 (Ubuntu Linux), S3, and RDS.
In-depth knowledge of relational databases (e.g., CockroachDB, PostgreSQL, Riak) with experience in performance optimization and query tuning; experience with Kafka is a plus.
Experience applying design patterns to enhance reliability, scalability, and performance in application development.
Excellent problem-solving skills with experience diagnosing complex system issues in production environments.
Proven ability to work cross-functionally with product, application, infrastructure, and security engineering teams.
Strong written and verbal communication skills, with the ability to explain complex technical concepts.

Why Us

100% Remote flexibility (not eligible for candidates located in California or New York)
Competitive salary of $130,000 - $170,000 base + equity.
Outstanding medical and dental benefits, including 100% employer-paid healthcare for the whole family.
Company-paid life and disability insurance.
Optional vision and supplemental insurance options, and various Flexible Spending Accounts (FSA).
Open Paid Time Off policy and 12 weeks of paid leave for new parents.
Matching 401(k) plan (5% up to $5,000 yearly).
$1,000 annual professional development stipend.
Monthly home working/digital lifestyle stipend, new MacBook, and one-time accessory reimbursement.
LinkedIn Learning subscription.
Access to company-paid professional coaching service.
Opportunities for remote employees to visit HQ in Durham, North Carolina.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs