Enable job alerts via email!

Senior Site Reliability Engineer

Hyperdrive Recruiting

Raleigh (NC)

Remote

USD 130,000 - 170,000

Full time

4 days ago
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading financial technology company seeks a Senior Site Reliability Engineer to enhance system reliability for their global payments platform. This role involves ensuring operational efficiency, leading incident responses, and working with development teams to implement best SRE practices. The position offers competitive compensation, extensive benefits, and 100% remote work flexibility.

Benefits

100% remote flexibility
Competitive salary
Employer-paid healthcare for family
Company-paid life and disability insurance
Open Paid Time Off policy
Matching 401(k) plan
$1,000 annual professional development stipend
Access to company-paid professional coaching

Qualifications

  • Hands-on experience with Datadog, OpenTelemetry, or similar platforms required.
  • Proficiency in Ruby or Elixir; Python accepted.
  • Experience with AWS services and relational databases is essential.

Responsibilities

  • Ensure reliability and performance of the payments platform through monitoring and automation.
  • Collaborate with development teams for improved application reliability.
  • Lead incident response efforts and mentorship for team members.

Skills

Observability
Monitoring
Automation
Problem-solving

Tools

Datadog
OpenTelemetry
Ruby
Elixir
AWS
PostgreSQL
Kafka

Job description

We are looking for a talented Senior Site Reliability Engineer (SRE) to ensure the reliability, observability, and scalability of a globally distributed payments platform.

We are afinancial technology company that provides an open payments platform, enabling and optimizing digital transactions with a comprehensive payment services marketplace.

Job Duties
  • Ensure the reliability, availability, and performance of a globally distributed payments platform, processing billions monthly, through monitoring, automation, and continuous improvement.
  • Collaborate with development teams to improve the reliability and performance of applications.
  • Implement and maintain robust observability solutions, enabling proactive identification, alerting, and resolution of issues.
  • Lead incident response efforts, including participation in a shared on-call rotation to maintain 24/7 system reliability, root cause analysis, and implementing preventative measures.
  • Develop and maintain automation tools to reduce manual intervention, streamline operations, and enhance developer productivity.
  • Monitor, analyze, and optimize the performance of relational databases, identifying and resolving bottlenecks.
  • Lead by example, infusing modern SRE best practices and fostering a culture of reliability and performance.
  • Provide technical guidance and mentorship to team members.
Ideal Background
  • Hands-on experience with Datadog, OpenTelemetry, Sentry, Sumo Logic or similar monitoring and observability platforms.
  • Proficiency in a modern programming language, with a proven ability to write clean, maintainable, and efficient code; Ruby, Rails, and Elixir experience are preferred (Python is also accepted)
  • Experience with AWS services, including EC2 (Ubuntu Linux), S3, and RDS.
  • In-depth knowledge of relational databases (e.g., CockroachDB, PostgreSQL, Riak) with experience in performance optimization and query tuning; experience with Kafka is a plus.
  • Experience applying design patterns to enhance reliability, scalability, and performance in application development.
  • Excellent problem-solving skills with experience diagnosing complex system issues in production environments.
  • Proven ability to work cross-functionally with product, application, infrastructure, and security engineering teams.
  • Strong written and verbal communication skills, with the ability to explain complex technical concepts.
Why Us
  • 100% Remote flexibility (not eligible for candidates located in California or New York)
  • Competitive salary of $130,000 - $170,000 base + equity.
  • Outstanding medical and dental benefits, including 100% employer-paid healthcare for the whole family.
  • Company-paid life and disability insurance.
  • Optional vision and supplemental insurance options, and various Flexible Spending Accounts (FSA).
  • Open Paid Time Off policy and 12 weeks of paid leave for new parents.
  • Matching 401(k) plan (5% up to $5,000 yearly).
  • $1,000 annual professional development stipend.
  • Monthly home working/digital lifestyle stipend, new MacBook, and one-time accessory reimbursement.
  • LinkedIn Learning subscription.
  • Access to company-paid professional coaching service.
  • Opportunities for remote employees to visit HQ in Durham, North Carolina.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Senior Site Reliability Engineer

ZipRecruiter

Raleigh

Remote

USD 130,000 - 170,000

Yesterday
Be an early applicant

Senior Site Reliability Engineer (AWS, AI/ML, & APM)

Davita Inc.

Remote

USD 120,000 - 160,000

Yesterday
Be an early applicant

Remote - Senior Site Reliability Engineer (SRE)

Green Dot

Remote

USD 87,000 - 132,000

Yesterday
Be an early applicant

Senior Site Reliability Engineer

Cadillac / GM

Remote

USD 100,000 - 160,000

Yesterday
Be an early applicant

Senior Site Reliability Engineer (Remote)

3C Deutschland GmbH

Remote

USD 133,000 - 240,000

3 days ago
Be an early applicant

Senior Site Reliability Engineer ( Remote - US)

Jobgether

Remote

USD 120,000 - 160,000

6 days ago
Be an early applicant

Senior Site Reliability Engineer

Roadie

Remote

USD 120,000 - 160,000

3 days ago
Be an early applicant

Senior Site Reliability Engineer (Remote)

Experian Group

Remote

USD 130,000 - 180,000

6 days ago
Be an early applicant

Remote Senior Site Reliability Engineer, Onchain - Gemini

WorksHub

New York

Remote

USD 120,000 - 160,000

3 days ago
Be an early applicant