Enable job alerts via email!

Site Reliability Engineer - Remote

PayNearMe

Santa Clara (CA)

Remote

USD 175,000 - 195,000

Full time

3 days ago
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

PayNearMe seeks a Site Reliability Engineer to enhance systems and infrastructure for application reliability and performance. You will automate processes, manage scalable infrastructure, utilize tools like Terraform and Datadog, and collaborate with development teams. This role requires a strong background in SRE and DevOps methodologies.

Benefits

Stock options with standard startup vesting
$50 monthly communication stipend
$250 stipend for WFH setup
Premium medical benefits including vision and dental
Paid parental bonding leave
401k plan
Flexible Time Off
Volunteer Time Off
13 scheduled holidays

Qualifications

  • 3+ years experience in SRE or DevOps.
  • Strong experience with Kubernetes and Docker.
  • Proficient with Terraform for infrastructure management.

Responsibilities

  • Design, implement, and maintain scalable infrastructure.
  • Deploy and manage Kubernetes clusters and containerized applications.
  • Respond to incidents and perform root cause analysis.

Skills

Cloud Platform Experience
Kubernetes and Containers
Scripting and Automation
Monitoring and Observability
DevOps Best Practices
Problem-Solving Ability

Tools

Terraform
GitLab CI
Datadog

Job description

Get AI-powered advice on this job and more exclusive features.

Company Description

PayNearMe develops technology to facilitate the end-to-end customer payment experience, making it easy for businesses to accept, disburse and manage payments. Our modern and reliable platform lowers the total cost of payments by increasing acceptance rates, driving self-service and simplifying exceptions. We future-proof our clients’ payments roadmap by including all payment types and channels through a single contract and integration.

Company Description

PayNearMe develops technology to facilitate the end-to-end customer payment experience, making it easy for businesses to accept, disburse and manage payments. Our modern and reliable platform lowers the total cost of payments by increasing acceptance rates, driving self-service and simplifying exceptions. We future-proof our clients’ payments roadmap by including all payment types and channels through a single contract and integration.

With PayNearMe, businesses can transform the outdated systems holding them back from achieving progress.

PayNearMe has over 200 employees, raised a $45M Series D round in June 2023, and processes billions in payments annually. Headquartered in Silicon Valley, our team is distributed across the U.S. Join us in solving our clients’ biggest payment challenges.

Job Description

As our Site Reliability Engineer, you will design, build, and maintain the systems and infrastructure that power our applications, ensuring their reliability, scalability, and performance. You will bring a software engineering approach to operations, automating processes, and continuously improving the infrastructure and tools to support our business needs.

What You’ll Do

  • Infrastructure Management: Design, implement, and maintain scalable and resilient infrastructure using Terraform for infrastructure as code, ensuring high availability and performance.
  • Kubernetes and Containers: Deploy, manage, and optimize Kubernetes clusters and containerized applications using Docker. Implement best practices for container orchestration and management.
  • Systems and Application Monitoring/Observability: Develop and maintain comprehensive monitoring and observability solutions using Datadog. Ensure detailed visibility into system performance and application health.
  • SLOs and SLA Management: Define, monitor, and maintain Service Level Objectives (SLOs) and Service Level Agreements (SLAs) to ensure reliable and consistent service delivery.
  • Incident Response and Troubleshooting: Respond to incidents, perform root cause analysis, and implement solutions to prevent recurrence. Participate in post-incident reviews and contribute to blameless postmortems.
  • Reliability and Production Environment Management: Ensure the reliability and stability of our production environments. Continuously assess and improve system reliability, identifying and addressing potential points of failure.
  • Automation and Scripting: Develop automation scripts and tools to reduce manual intervention and improve system reliability using Python, Bash, or Go. Implement and improve CI/CD pipelines.
  • CI/CD Pipeline Management: Enhance and maintain continuous integration and continuous deployment pipelines using GitLab CI. Ensure seamless and reliable deployment processes.
  • Capacity Planning and Scaling: Assist in capacity planning and ensure that systems are scalable to meet future demands. Implement auto-scaling strategies where applicable.
  • Security and Compliance: Implement security best practices and ensure compliance with industry standards. Regularly review and update security policies and procedures.
  • Collaboration and Support: Work closely with development teams to ensure reliability and scalability of new features and services. Provide technical support and guidance on infrastructure-related issues.
  • Software Engineering for Operations: Develop and maintain internal tools and services that enhance the efficiency and reliability of our operations.
  • On-Call Rotation: Participate in an on-call rotation to address production issues and collaborate in incident response efforts.

Qualifications

  • Experience: +3 years of experience in SRE, DevOps, or a related role.
  • Cloud Platform Experience: Proficient with cloud platforms such as AWS, GCP, or Azure. Experience with EC2, RDS, VPCs, and security groups is essential.
  • Kubernetes and Containers: Strong experience with Kubernetes and Docker, including deployment, scaling, and management of containerized applications.
  • Infrastructure as Code: Expert in using Terraform for infrastructure as code. Proficient with configuration management tools such as Ansible, Puppet, or Chef.
  • Monitoring and Observability: Extensive experience with monitoring and observability tools like Datadog, Prometheus, Grafana, ELK stack, or Splunk. Skilled in setting up detailed monitoring and logging systems.
  • SLOs and SLA Management: Proven ability to define, monitor, and maintain SLOs and SLAs to ensure reliable service delivery.
  • Scripting and Automation: Strong skills in scripting languages like Python, Bash, or Go. Experience automating repetitive tasks and processes.
  • CI/CD Practices: Familiarity with GitLab CI or similar tool for continuous integration and deployment. Experience in setting up and managing pipelines.
  • Production Environments: Experience supporting production environments running Go or Ruby/Rails applications.
  • Tool Development: Ability to write and update tools to support infrastructure and application management, demonstrating the principle that “SRE is what happens when you ask a software engineer to design an operations team.”
  • DevOps Best Practices: Deep understanding of DevOps principles, practices, and tools to drive continuous improvement in the software development lifecycle.
  • Soft Skills: Strong organizational skills, attention to detail, and the ability to work collaboratively in a team environment. Excellent documentation skills to ensure accurate and detailed records.
  • Problem-Solving Ability: Excellent analytical and problem-solving skills to diagnose and resolve complex system issues quickly and effectively.

Additional Information

Benefits

  • Base salary per year (paid semi-monthly)
  • Fast- paced and professional work culture
  • Stock options with standard startup vesting - 1 year cliff; 4 years total
  • $50 monthly communication expense stipend to go towards your phone/internet bill
  • $250 stipend to enhance your WFH setup
  • Reimbursement for peripheral equipment: monitor (up to $400), keyboard and mouse (up to $200)
  • Premium medical benefits including vision and dental (100% coverage for employees)
  • Company-sponsored life and disability insurance
  • Paid parental bonding leave
  • Paid sick leave, jury duty, bereavement
  • 401k plan
  • Flexible Time Off (our team members typically take off ~3-4 weeks per year)
  • Volunteer Time Off
  • 13 scheduled holidays
  • 4-6x / year in-person team meet-ups

Salary Range: $175,000 - $195,000

PayNearMe strives to create a workplace where all employees thrive. Our core values represent who we are today and we take pride in the way we work with each other as well as with our stakeholders.

We’re in this together to do the right thing. We deliver real results we are proud of while remaining respectful, transparent, and flexible.

PayNearMe is an equal opportunity employer. We are diligently and thoughtfully working towards cultivating a diverse workforce which in turn, enhances our products and services for the communities we serve. Applicants who represent all backgrounds are strongly encouraged to apply.



Candidate information will be treated in accordance with our job applicant privacy notice found at: https://home.paynearme.com/ccpa-privacy-notice-jobs-employees/

Assistance for Disabled Applicants

Alternative formats of this Notice are available to individuals with a disability. Please let us know if you need assistance.

All your information will be kept confidential according to EEO guidelines.

Seniority level
  • Seniority level
    Mid-Senior level
Employment type
  • Employment type
    Full-time
Job function
  • Job function
    Engineering and Information Technology

Referrals increase your chances of interviewing at PayNearMe by 2x

Get notified about new Site Reliability Engineer jobs in Santa Clara, CA.

Redwood City, CA $200,000.00-$275,000.00 2 days ago

Santa Clara, CA $101,000.00-$161,000.00 3 days ago

Santa Clara, CA $75.00-$80.00 16 hours ago

Remote Senior Software Engineer (LLM) - 34953

San Jose, CA $130,000.00-$160,000.00 3 days ago

Remote Senior Software Engineer (LLM) - 34953
Senior Site Reliability Engineer - remote
Remote Senior Software Engineer (LLM) - 34953
Remote Senior Software Engineer (LLM) - 34953
Sr. Software Engineer, Full-Stack Opening, Chile
Staff Site Reliability Engineer - remote
Remote Senior Software Engineer (LLM) - 34953

Santa Clara, CA $80,000.00-$155,000.00 1 month ago

Palo Alto, CA $165,000.00-$185,000.00 2 weeks ago

Principal Software Engineer - Data Platform
Senior Site Reliability / Gitops Engineer

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Site Reliability Engineer- Remote

Lensa

San Francisco

Remote

USD 93,000 - 180,000

2 days ago
Be an early applicant

Site Reliability Engineer - Remote

ZipRecruiter

Santa Clara

Remote

USD 175,000 - 195,000

2 days ago
Be an early applicant

Site Reliability Engineer Remote

PayNearMe

Santa Clara

Remote

USD 175,000 - 195,000

6 days ago
Be an early applicant

Senior Site Reliability Engineer - remote

ZipRecruiter

Santa Clara

Remote

USD 169,000 - 211,000

15 days ago

Staff Site Reliability Engineer - remote

ZipRecruiter

Santa Clara

Remote

USD 158,000 - 198,000

15 days ago

Senior Platform Architect - Contact Center Technologies

Zendesk, Inc.

San Francisco

Remote

USD 120,000 - 180,000

2 days ago
Be an early applicant

Senior Platform Engineer

ZipRecruiter

Fremont

Remote

USD 170,000 - 220,000

2 days ago
Be an early applicant

Site Reliability Engineer

Seer

Remote

USD 100,000 - 300,000

2 days ago
Be an early applicant

Site Reliability Engineer - Core C++ Team

ClickHouse

Remote

USD 130,000 - 210,000

2 days ago
Be an early applicant