Enable job alerts via email!

Sr. Site Reliability Engineer (SRE)

Pendo.io

Sheffield

On-site

GBP 64,000 - 69,000

Full time

30+ days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

Join a dynamic and innovative team at a rapidly growing startup focused on enhancing user experiences with software. As a Site Reliability Engineer, you will play a critical role in maintaining and provisioning cloud infrastructure, ensuring reliability and performance across various product initiatives. Collaborate closely with developers and product managers to design resilient systems and automate processes using cutting-edge technologies like Kubernetes and Google Cloud services. This role offers the chance to make a significant impact while working in a passionate and fun environment, surrounded by talented individuals committed to excellence and diversity.

Qualifications

  • 5+ years of experience in cloud infrastructure and automation.
  • Strong programming skills in Go or Python and experience with Kubernetes.

Responsibilities

  • Automate provisioning, deployment, and monitoring of infrastructure.
  • Debug production issues and maintain operational runbooks.

Skills

Go Programming
Python Programming
Systems Thinking
Performance Analysis
Operational Metrics

Education

Bachelor's Degree in Computer Science

Tools

Ansible
Terraform
Google Kubernetes Engine (GKE)

Job description

The Site Reliability Engineering (SRE) team at Pendo is responsible for provisioning and maintaining cloud infrastructure from development through production for all product initiatives, and working with developers and product managers to ensure that our products are not only reliable and performant, but also cost-efficient. Our platform is built on Google Kubernetes Engine (GKE) and utilizes several other Google technologies such as Memorystore, Cloud Datastore, PubSub, Cloud Functions, BigQuery, and Vertex AI, as well as services from other vendors such as Amazon SES.

In the development process, SREs provide developers with stable and performant CI and release pipelines and development environments to facilitate frequent delivery of new product features. In production, SREs perform Tier 1 on-call and incident management functions, supporting a high-throughput platform which processes more than 15 billion events per day. To ensure the reliability of this environment for our customers, SREs work closely with developers and product managers to understand service level objectives, think through failures scenarios, and design systems which balance cost with reliability objectives. Additionally, SREs collaborate with the Information Security team to ensure that cloud infrastructure is properly secured, and that sufficient controls are in place to meet our compliance goals with respect to industry standards such as SOC 2.

Role Responsibilities
  • Write high-quality infrastructure-as-code that automates the provisioning, deployment, scaling, and monitoring of Pendo’s infrastructure to ensure that it is reliable and performant.
  • Write maintainable code for product functionality with a primary emphasis on operations, scale, resiliency, and monitoring.
  • Work with other engineers to ensure that new services are well-designed, properly monitored and have well-defined SLIs and achievable SLOs.
  • Debug production issues, learn to mitigate them quickly, and find ways to prevent them.
  • Maintain runbooks for manual tasks and replace those runbooks with automation whenever possible.
  • Proactively track our capacity, quotas, and other performance limits to plan for growth.
  • Participate in a 24x7 on-call rotation to handle product availability issues as well as urgent customer support escalations.
Minimum Qualifications
  • Bachelor's Degree in Computer Science or related technical field.
  • Minimum of five (5) years of professional technical experience.
  • Experience working with cloud infrastructure using tools such as Ansible or Terraform.
  • Strong programming skills in a language such as Go or Python, and a willingness to learn new languages as needed.
  • Ability to think and talk about systems in terms of possible failure modes, bottlenecks, etc.
  • Good number sense for discussing performance analysis, cost analysis, and operational metrics.
Preferred Qualifications
  • Minimum of five (5) years experience as a Site Reliability Engineer, or DevOps Engineer.
  • Experience designing, analyzing, and troubleshooting distributed systems.
  • Experience maintaining Kubernetes clusters in a production environment.
Pendo Description:

Pendo was founded in 2013 by former product managers, who combined their heads and hearts to build something they wanted but never had as product managers -- a simple way to understand and attack what truly drives product success. Our mission is to improve society's experience with software.

Come join one of the fastest-growing startups, supported by best-in-class institutions like Battery Ventures, Salesforce Ventures, Spark Capital and Meritech. You will gain experience in a diverse and exciting set of technologies and clients and have a real impact on Pendo's future. Our culture is passionate, dynamic, and fun.

EEOC

We are an equal opportunity employer and believe having diverse teams where everyone brings their whole self to Pendo is key to our success. We welcome all people of different backgrounds, experiences, abilities and perspectives.

Accessibility

Pendo is committed to working with, and providing access and reasonable accommodation to, applicants with mental and/or physical disabilities. If you think you may require an accommodation for any part of the recruitment process, please send a request to: accommodation@pendo.io. All requests for accommodations are treated discreetly and confidentially, as practical and permitted by law.

Compensation

Our salary ranges are based on paying competitively for our size and industry, and are one part of many compensation, benefits and other reward opportunities we provide.

The expected salary range for this role to be performed in Sheffield, UK is £64,000 - £69,000.

Individual pay rate decisions, including offers made within and over the expected salary range, are based on a number of factors, including qualifications for the role, experience level, skillset, and balancing internal equity relative to peers at the company.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Site Reliability Engineer

JR United Kingdom

Stockport

Remote

GBP 60.000 - 90.000

9 days ago

Site Reliability Engineer

JR United Kingdom

Wakefield

Remote

GBP 55.000 - 75.000

9 days ago

Site Reliability Engineer, Compute

Vercel

Remote

GBP 55.000 - 80.000

3 days ago
Be an early applicant

Site Reliability Engineer (SRE) (Remote)

Remotestar

Cambourne

Remote

GBP 55.000 - 80.000

3 days ago
Be an early applicant

Site Reliability Engineer

Unitary

Remote

GBP 60.000 - 80.000

5 days ago
Be an early applicant

Site Reliability Engineer - Automation and Tooling (Scotland Remote)

Ivanti

Dundee

Remote

GBP 40.000 - 70.000

12 days ago

Site Reliability Engineer

JR United Kingdom

West Midlands Combined Authority

Remote

GBP 50.000 - 75.000

9 days ago

Senior Site Reliability Engineer

General Motors

Remote

GBP 60.000 - 90.000

29 days ago

Senior Site Reliability Engineer

Auros

Greater London

Remote

GBP 60.000 - 100.000

30+ days ago