Enable job alerts via email!

Senior Site Reliability Engineer

Rackspace Technology

United States

Remote

USD 80,000 - 130,000

Full time

4 days ago
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An innovative technology firm is seeking talented individuals to join their Professional Services Center of Excellence. This role focuses on solving complex business problems by enhancing application performance monitoring. You will work with cutting-edge tools, including Datadog and New Relic, to create exceptional customer experiences. The position requires collaboration with development teams to implement robust observability solutions, ensuring system reliability and performance. Join a company recognized for its commitment to diversity and employee satisfaction, where your contributions will shape the future of technology and customer success.

Qualifications

  • 3+ years experience in AWS EKS and Azure AKS infrastructure.
  • Scripting experience with Python, Go, Bash, and AWS CLI tools.

Responsibilities

  • Implement Observability solutions and maintain scalable systems.
  • Develop monitoring tools, alerts, and dashboards for system health.

Skills

AWS EKS
Azure AKS
Terraform
Kafka
SaaS environments
SRE
Prometheus
Grafana
Datadog
GitOps
Python
Go
Bash
AWS CLI
Disaster recovery strategies

Tools

Kubernetes
ELK

Job description

Rackspace is building up its Professional Services Center of Excellence on Application Performance Monitoring Suites.

If you enjoy solving complex business problems and can contribute to building the next generation of modern applications for our customers—helping them understand the connections between application performance, user experience, and business outcomes—creating amazing customer experiences with modern interpretations of SRE, Observability using Datadog, New Relic, AppDynamics, or Dynatrace, then join us!

Rackspace enables businesses to accelerate digital transformation through our innovative data, integration solutions, and tools that help you fix problems quickly, maintain complex systems, and improve code. We believe Datadog, AppDynamics, or New Relic will be significant contributors to our work, and we seek talented, creative, and thoughtful individuals to shape Observability Engineering for our customers.

You Will:
  • Work with customers and implement Observability solutions
  • Build and maintain scalable systems and robust automation supporting engineering goals
  • Develop and maintain monitoring tools, alerts, and dashboards to provide visibility into system health and performance
  • Proactively gather and analyze metric and log data to perform anomaly detection, performance tuning, capacity planning, and fault isolation
  • Collaborate with development teams to implement and deploy new features and enhancements, ensuring they meet reliability, security, and performance standards
  • Document and share solutions collaboratively with team members
  • Maintain a deep understanding of the customer’s business and technical environment
  • Identify performance bottlenecks, anomalous system behavior, and resolve root causes of service issues
You Need to Have:
  • At least 3+ years of experience designing, building, and maintaining AWS EKS and Azure AKS infrastructure with Terraform
  • 3+ years' experience with Kafka in large-scale environments with hundreds of terabytes to petabytes of data from numerous endpoints
  • Experience designing, building, and maintaining SaaS environments for 3+ years
  • 3+ years as an SRE within a large team, with solid experience with Prometheus, Grafana, Datadog, ELK, etc.
  • 3+ years building and running Kubernetes clusters with expertise in scaling, operators, and troubleshooting
  • Experience with observability (monitoring, logging, tracing, metrics) for 3+ years
  • Experience with GitOps CI/CD processes for 3+ years
  • Scripting experience with Python, Go, Bash, and AWS CLI tools for 3+ years
  • Knowledge of security operations, policies, infrastructure, key management, and encryption at rest and in transit for 3+ years
  • Experience implementing and maintaining disaster recovery strategies (MySQL, Zookeeper, etc.) for 3+ years

#LI-JB2

About Rackspace Technology

We are multicloud solutions experts, combining our expertise with leading technologies across applications, data, and security to deliver end-to-end solutions. We have a proven record of advising customers, designing scalable solutions, and optimizing returns. Named a best place to work repeatedly by Fortune, Forbes, and Glassdoor, we attract and develop world-class talent. Join us to embrace technology, empower customers, and deliver the future.

More on Rackspace Technology

Though we’re all different, Rackers thrive through our shared goal: to be valued members of a winning team on an inspiring mission. We bring our whole selves to work and believe that diverse perspectives fuel innovation and better serve our customers and communities worldwide. We welcome your application and are committed to equal employment opportunity without regard to age, race, gender, disability, or other protected characteristics. If you need accommodation, please let us know.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

FlightAware- Sr. Site Reliability Engineer (Remote)

Lensa

Austin

Remote

USD 101,000 - 203,000

2 days ago
Be an early applicant

Sr. Site Reliability Engineer

Dayforce

Remote

USD 80,000 - 120,000

Yesterday
Be an early applicant

Senior Site Reliability Engineer - Azure - Remote

Optum

Eden Prairie

Remote

USD 89,000 - 177,000

4 days ago
Be an early applicant

FlightAware- Sr. Site Reliability Engineer (Remote)

Pratt & Whitney

Remote

USD 101,000 - 203,000

4 days ago
Be an early applicant

Sr. Site Reliability Engineer

Dayforce US, Inc.

Minnesota

Remote

USD 80,000 - 130,000

6 days ago
Be an early applicant

Senior Reliability Engineer

JLL

Chicago

Remote

USD 120,000 - 140,000

Yesterday
Be an early applicant

Senior Site Reliability Engineer

Bitwarden

Santa Barbara

Remote

USD 120,000 - 185,000

8 days ago

Senior Site Reliability Engineer

Bitwarden Inc.

California

Remote

USD 120,000 - 185,000

9 days ago

Senior Site Reliability Engineer - Wikimedia Enterprise

Wikimedia Foundation

Remote

USD 105,000 - 164,000

25 days ago