Enable job alerts via email!

Site Reliability Engineer

Alchemy

San Francisco, California, New York (CA, MO, NY)

On-site

USD 135,000 - 275,000

Full time

30+ days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An established industry player is seeking an Infrastructure Engineer to enhance developer productivity and ensure product reliability. In this pivotal role, you will collaborate with a talented engineering team to design and improve the infrastructure for a globally used developer platform. Your expertise in Reliability Engineering will help set high standards and best practices across the organization. This innovative firm offers competitive compensation, including a substantial salary range and comprehensive benefits, making it an exciting opportunity for professionals passionate about building scalable and reliable systems.

Benefits

Comprehensive medical coverage
Dental coverage
Vision coverage
401k
Unlimited flexible time off

Qualifications

  • 5+ years as an Infrastructure Engineer focused on Reliability.
  • Experience with large-scale, multi-region production systems.

Responsibilities

  • Set high standards for Reliability and develop best practices.
  • Architect production infrastructure that encourages high reliability.

Skills

Reliability Engineering
Site Reliability Engineering
Production Systems Design
Observability Best Practices
Communication Skills
Collaboration Skills

Education

Bachelor's Degree in Computer Science or related field

Tools

Prometheus
Grafana
Datadog
AWS
Docker
Kubernetes
Terraform
Pulumi
Chef
Puppet

Job description

Our mission is to bring web3 to a billion people, by providing builders with the tools they need to build exceptional onchain products. Alchemy is the only complete developer platform that offers the powerful APIs, SDKs, and tools necessary to build and scale onchain apps and rollups.

Our infrastructure powers 70% of the top web3 teams, 90%+ of web2 companies building in web3 and 100+ million end users. Our customers include top web3 brands like Polymarket, OpenSea, Circle, WorldCoin, as well as major global brands like Shopify and Adobe.

The Alchemy team draws from decades of deep expertise in massively scalable infrastructure, AI, and blockchain from leadership roles at leading companies and universities like Google, Microsoft, Facebook, Stanford, and MIT.

We're backed by the world's leading VCs and institutions, including: Lightspeed, Silver Lake, a16z, Coatue, Pantera, Addition, Stanford University, Coinbase, and Charles Schwab, among others.

The Role

As an engineer in the Infrastructure department at Alchemy, you will collaborate with our engineering team to design, deploy, and continuously improve the infrastructure supporting our globally used developer platform. Your focus will be on enhancing developer productivity and ensuring product reliability as we scale.

The Infrastructure team’s mission is to provide the infrastructure, tooling and expertise needed to allow Alchemy engineers to ship, scale and operate high quality products to our customers in a fast, safe and cost efficient manner.

Come and help us build, maintain and scale the underlying infrastructure that is required to build products that delight our customers when it comes to reliability, latency and cost.

What You'll Do:

  • Set high standards for Reliability at Alchemy
  • Develop and own company wide Reliability best practices like SLO definition, incident management, postmortem reviews, launch readiness reviews, change management
  • Architect production infrastructure and tools that encourage and enforces high reliability
  • Inspire the broader engineering organization to ensure Reliability is a first class citizen in the products we build
  • Collaborate, partner, advise, review and mentor engineering teams on Reliability topics like high reliability architecture, observability, safe change management
  • Improve critical infrastructure and systems that are used to operate infrastructure at scale (i.e. compute, networking, deployment, observability, code tooling/libraries etc.)
  • Develop and own best practices for managing production infrastructure: provisioning, application scaling, configuration management, capacity planning, monitoring, etc.
  • Develop and own best practices for developer processes: CI/CD, dev and staging environments, etc.
  • Provide input into long-term platform requirements and operational guidelines with a focus on reliability
  • Continuously raise our standard of engineering excellence by implementing best practices for coding, testing, and deployment
  • Build and maintain documentation around process and workflows

What We're Looking For:

  • 5+ years of experience as an Infrastructure Engineer focused on Reliability (e.g., Site Reliability Engineer, Production Engineer, Platform Engineer)
  • Experience leading and driving company wide reliability efforts and engineering initiatives
  • Experience with observability best practices and tooling like Prometheus, Grafana and Datadog
  • Experience designing and operating large-scale, multi-region production systems
  • Experience working with AWS or other cloud infrastructures
  • Experience with container schedules and runtimes such as Docker and Kubernetes
  • Experience with Infrastructure-as-Code (e.g. Terraform, Pulumi, Chef, Puppet, etc)
  • The cross-functional nature of this role requires strong communication and collaboration skills
  • (Preferred) Experience with running production services on bare-metal
  • (Preferred) Experience with Typescript and Python
  • (Preferred) Excellent understanding of web applications and architecture

More on The Role

Alchemy is committed to offering competitive compensation, including base salary as well as equity. Additionally, Alchemy offers comprehensive medical, dental, and vision coverage, as well as other benefits such as 401k and unlimited flexible time off.

The base salary range for this position is estimated to be between $135,000 - $275,000 annually. Please note this range reflects base salary only, and does not include bonus, equity, or benefits. Your salary will be determined by various factors, including relevant experience, skill set, qualifications, and other business needs.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Senior Site Reliability Engineer San Francisco Bay Area (CA), Denver (CO), Lexington (KY), New [...]

AppOmni Inc.

San Francisco

Remote

USD 156,000 - 212,000

30+ days ago

Site Reliability Engineer Remote

PayNearMe

Santa Clara

Remote

USD 175,000 - 195,000

Yesterday
Be an early applicant

Staff Site Reliability Engineer - remote

ZipRecruiter

Santa Clara

Remote

USD 158,000 - 198,000

10 days ago

Software Engineer, Safety Processing San Francisco (USA) Remote (USA) Discord Posted a day ago [...]

Gamecompanies

San Francisco

Remote

USD 160,000 - 180,000

10 days ago

Senior Site Reliability Engineer - remote

ZipRecruiter

Santa Clara

Remote

USD 169,000 - 211,000

10 days ago

Lead Site Reliability Engineer

Corelight

San Francisco

Remote

USD 184,000 - 229,000

30+ days ago

Platform Engineer San Francisco / Remote

Comulate

San Francisco

Remote

USD 120,000 - 180,000

Yesterday
Be an early applicant

Principal Site Reliability Engineer - Paze

Early Warning

San Francisco

Hybrid

USD 190,000 - 230,000

Yesterday
Be an early applicant

Principal Site Reliability Engineer - Paze

Davita Inc.

San Francisco

Hybrid

USD 190,000 - 230,000

Yesterday
Be an early applicant