Enable job alerts via email!

Senior Site Reliability Engineer

Underdog Fantasy

United States

Remote

USD 90,000 - 150,000

Full time

30+ days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An innovative firm is seeking a Site Reliability Engineer to enhance their cloud infrastructure and web applications. This role involves owning the incident response process, leading capacity planning, and collaborating on architecture decisions to ensure high availability. The company, rapidly growing and valued at over $1.2 billion, offers a dynamic environment where your contributions will directly impact the future of sports gaming. With a commitment to flexibility and employee well-being, this opportunity is perfect for someone eager to take ownership and thrive in a collaborative setting. Join a team that values bold ideas and cutting-edge technology in the exciting world of sports gaming.

Benefits

Unlimited PTO
16 weeks fully paid parental leave
$500 home office allowance
401k match
Company paid health, dental, vision plans

Qualifications

  • 6+ years in site reliability engineering or cloud infrastructure.
  • Strong communication and collaboration skills are essential.
  • Experience with multiple programming languages and frameworks.

Responsibilities

  • Own and maintain the incident response process and best practices.
  • Lead capacity planning initiatives for scalability and cost optimization.
  • Collaborate on architecture decisions to ensure high availability.

Skills

Site Reliability Engineering
Cloud Infrastructure
Web Application Development
Communication Skills
Collaboration
Data-Driven Decision Making
Ownership
Automation
Incident Response

Tools

AWS
Kubernetes
Datadog
Pagerduty
PostgreSQL
Redis

Job description

The fastest-growing sports gaming company – ever.

We build innovative games, products, and experiences for American sports fans.

We’re here to shake up the fastest growing industry with bold ideas, custom-built tech, and the drive to win.

Founded in 2020, our team has built four of today’s most widely played fantasy games and launched the Underdog Sportsbook – built entirely in-house with our own technology. That means we control our product, move fast, and create experiences you won’t find anywhere else.

In just over two years, we’ve reached over a $1.2 billion valuation, with investors like BlackRock, Spark Capital, SV Angel, Mark Cuban, Kevin Durant, and Adam Schefter. And we’re just getting started.

At Underdog, we believe that sports are for everyone. Join us.

What you’ll do:
  • Own and maintain the incident response process, including defining procedures, tools, and best practices
  • Guide teams in establishing and monitoring Service Level Objectives (SLOs), including setting up alerts and reporting systems
  • Lead capacity planning initiatives, focusing on both short and long-term scalability while optimizing costs
  • Develop and implement disaster recovery plans, including regular testing and regulatory compliance
  • Collaborate with teams on architecture decisions to ensure high availability and scalability
  • Manage launch and event planning for high-traffic occasions, focusing on infrastructure preparation and capacity management (a.k.a. Launch Readiness)
  • Act as an internal expert and consultant for monitoring tools like Datadog and Pagerduty and infrastructure like AWS and Kubernetes
  • Emphasis on automation and tooling to scale our workload
  • Jump in and out of repos written in languages like Ruby, Python, Go, Typescript, Swift, Kotlin, and SQL to support efforts described above
Who you are:
  • 6+ years of experience in site reliability engineering, cloud infrastructure, and/or web application development
  • A strong written and verbal communicator
  • Collaborative by nature
  • Someone who enjoys using research, data, and experiments to make decisions; you believe “Hope is not a strategy.”
  • You enjoy working directly with customers (generally engineers or other people inside the company)
  • You think long-term about what is best for the business and its customers
  • You are excited to take ownership
  • You are very comfortable around an IDE, working with multiple languages, multiple web application frameworks, AWS services, Kubernetes, PostgreSQL
  • You can work independently to learn new languages/technologies as needed
  • You enjoy deploying changes to production quickly, multiple times a week if necessary
Even better if you have:
  • Experience with PostgreSQL SQL query optimization, tweaking autovacuum settings, table statistics, different index types, etc.
  • Experience with Redis/Valley Optimization
  • Experience with Datadog or similar products
  • Experience working as a web application developer, frontend or backend, especially in React and Ruby on Rails
  • Experience with AWS cost optimization
  • Read the Google SRE books or similar books, or have other forms of SRE training
  • Actively leveraging the capabilities of AI to augment abilities and gain knowledge about interested domains
What we can offer you:
  • Unlimited PTO (we're extremely flexible with the exception of the first few weeks before & into the NFL season)
  • 16 weeks of fully paid parental leave
  • A $500 home office allowance
  • A connected virtual first culture with a highly engaged distributed workforce
  • 5% 401k match, FSA, company paid health, dental, vision plan options for employees and dependents
This position may require sports betting licensure based on certain state regulations.

Underdog is an equal opportunity employer and doesn't discriminate on the basis of creed, race, sexual orientation, gender, age, disability status, or any other defining characteristic.

Apply for this job
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Senior Site Reliability Engineer

Censys, Inc.

Ann Arbor

Remote

USD 145,000 - 195,000

5 days ago
Be an early applicant

Sr. Site Reliability Engineer

Dayforce

Remote

USD 80,000 - 120,000

5 days ago
Be an early applicant

Senior Site Reliability Engineer

Bitwarden

Santa Barbara

Remote

USD 120,000 - 185,000

12 days ago

Senior Site Reliability Engineer

Bitwarden Inc.

California

Remote

USD 120,000 - 185,000

13 days ago

Senior Site Reliability Engineer - Azure - Remote

Optum

Eden Prairie

Remote

USD 89,000 - 177,000

9 days ago

FlightAware- Sr. Site Reliability Engineer (Remote)

Pratt & Whitney

Remote

USD 101,000 - 203,000

9 days ago

Senior Reliability Engineer

JLL

Chicago

Remote

USD 120,000 - 140,000

5 days ago
Be an early applicant

Senior Site Reliability Engineer - 2289298

Optum

Eden Prairie

Remote

USD 103,000 - 192,000

Today
Be an early applicant

Senior Site Reliability Engineer - 2289298

UnitedHealth Group

Eden Prairie

Remote

USD 103,000 - 192,000

Today
Be an early applicant