Enable job alerts via email!

Senior Site Reliability Engineer

ZipRecruiter

Raleigh (NC)

Remote

USD 128,000 - 193,000

Full time

24 days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An innovative company is seeking a Site Reliability Engineer to join their dynamic team. This role offers the chance to shape the reliability and performance of their cutting-edge cloud infrastructure. You'll collaborate with various engineering teams to implement scalable systems, manage service level objectives, and enhance incident response processes. This is a unique opportunity to make a significant impact in a fast-paced environment, working with a passionate team dedicated to delivering exceptional cloud services. If you're ready to take your career to the next level and be part of a growing organization, this position is perfect for you.

Benefits

Flexible work environment
Healthcare contributions
Equity in the company
Flexible time off
$500 Home office setup
Employee-driven international mobility

Qualifications

  • 8+ years of experience in Site Reliability Engineering or related field.
  • Strong knowledge of cloud platforms like AWS, Azure, or Google Cloud.

Responsibilities

  • Collaborate with engineering teams to design scalable and secure systems.
  • Manage service level objectives and incident response processes.

Skills

Site Reliability Engineering
Go
Python
Cloud Computing
Distributed Databases
SQL
Kubernetes
Docker
Ansible
Terraform

Education

Bachelor's degree in Computer Science
Master's degree in Computer Science

Tools

Kubernetes
Docker
Ansible
Terraform

Job description

Job Description

About ClickHouse

We are the company behind the popular open-source, high performance columnar OLAP database management system for real-time analytics. ClickHouse works 100-1000x faster than traditional approaches. By offering a true column-based DBMS, it allows for systems to generate reports from petabytes of raw data with sub-second latencies. With an amazing community already adopting our open-source technology, we are now embracing our journey in delivering Cloud first solutions to delight our customers.

With top adopters such as Lyft, Cisco, and eBay - not only do our products work at lightning speed, so do we.

We are an open and collaborative company. Our colleagues are curious, engaged and excited about what they do. If you want to work in an environment where you can learn, grow, be an agent of change and have your voice heard - then please read on!

Note: This position can be based remotely in any country ClickHouse has a hiring presence.

We are committed to providing our customers with reliable and secure services at ClickHouse. To continue this, we are building out our Site Reliability Engineering team. As one of the first joiners to our Reliability Engineering Team at ClickHouse, you will be responsible for building and leading processes to ensure the reliability, availability, scalability, and performance of our cloud infrastructure that runs ClickHouse databases. You will collaborate with different teams like Control Plane, Dataplane, Core, Security, Support and Operations and guide them to design and implement scalable, secure, highly available and fault-tolerant distributed systems. You will also own the areas of incident management and response, post-mortem analysis including running blameless postmortems, and continuous improvement of our ClickHouse services. You will be leveraging your software engineering expertise to develop software platforms and tools to optimize the operational and engineering efficiencies of ClickHouse Cloud. This role is a unique opportunity to make a significant impact on our elastic, limitless scale, high-performance, serverless ClickHouse Cloud.

What will you do?

  • Collaborate with various engineering teams in ClickHouse to design and implement scalable, secure, and highly available systems for ClickHouse.
  • Establish and manage service level objectives (SLOs) and service level agreements (SLAs) for ClickHouse Cloud.
  • Ensure all the infrastructure components in ClickHouse Cloud (including Dataplane, Control Plane and ClickHouse Core) have monitoring and alerting in place to ensure timely detection and resolution of incidents.
  • Enhance and refine incident response processes and post-mortem analysis for any outages in ClickHouse Cloud including working with the support team to communicate to the impacted customers.
  • Continuously improve the reliability and performance of our ClickHouse services.
  • Plan, enable, and drive Chaos initiatives across Engineering teams, based upon internal priorities.
  • Manage on-call processes to respond to performance and reliability issues, and establish best practices for coordinating escalation to resolve issues and minimize downtime.

About you:

  • Bachelor's or Master's degree in Computer Science or a related field.
  • At least 8 years of experience in Site Reliability Engineering or a related field.
  • Previous experience using ClickHouse in production.
  • Coding experience with Go and/or Python.
  • Strong knowledge of cloud computing platforms such as AWS, Azure, or Google Cloud Platform.
  • Excellent understanding of distributed databases and SQL, particularly ClickHouse is a major plus.
  • Hands-on experience with container orchestration tools such as Kubernetes or Docker Swarm.
  • Strong experience with automation and configuration management tools such as Ansible, Terraform, or Puppet.
  • You are a strong problem-solver and have solid production debugging skills.
  • You are passionate about efficiency, availability, scalability, and data governance.
  • You thrive in a fast-paced environment as part of a global team, and you see yourself as a partner with the business with the shared goal of moving the business forward.
  • You have a high level of responsibility, ownership, and accountability.
  • Excellent communication and interpersonal skills.

#LI-Remote

New York Area / San Francisco Area - Salary Range $151,000—$226,000 USD General US Remote Salary Range $128,350—$192,100 USD Los Angeles, CA / Washington, DC - Salary Range $135,900—$203,400 USD

Compensation

This role offers cash compensation and a stock options grant. For roles based in the United States, you can find above our typical starting salary ranges for this role, depending on your specific location.

The positioning of offers within a certain range depends on various factors, including: candidate experience, qualifications, skills, business requirements and geographical location.

If you have any questions or comments about compensation as a candidate, please get in touch with us at paytransparency@clickhouse.com.

Perks

  • Flexible work environment - ClickHouse is a distributed company offering remote-first work to all employees
  • Healthcare - Employer contributions towards your healthcare.
  • Equity in the company - Every new team member who joins our company receives stock options.
  • Time off - Flexible time off in the US, generous entitlement in all countries.
  • A $500 Home office setup if you're a remote employee.
  • Employee-driven international mobility - we enable you to relocate internationally if you wish (within certain countries and timelines and subject to role requirements, time zones and work permit considerations)

Culture - We All Shape It

As part of our first 500 employees, you will be instrumental in shaping our culture.

We look for candidates who are:

  • Motivated by doing great work as part of a team :)
  • Open to learning from others and sharing with others
  • Team Players: helpful, resourceful, responsive
  • Respectful and see feedback as an opportunity to grow

Are you interested in finding out more about our culture? We are a one year old company therefore we are excited to be building it together at the moment. Our first 500 employees are the culture shapers of our future. Check out our blog posts or follow us on LinkedIn to find out more about what's important to us, and to find out if you'd like to come and contribute to building our culture with us!

Please see here for our Privacy Statement.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

FlightAware- Sr. Site Reliability Engineer (Remote)

Lensa

Austin

Remote

USD 101,000 - 203,000

2 days ago
Be an early applicant

Senior Site Reliability Engineer

Censys, Inc.

Ann Arbor

Remote

USD 145,000 - 195,000

Today
Be an early applicant

Senior Site Reliability Engineer

Yelosoftware

Remote

USD 90,000 - 150,000

Today
Be an early applicant

Senior Site Reliability Engineer

Infosys Limited Digital

Raleigh

On-site

USD 90,000 - 140,000

Yesterday
Be an early applicant

[Hiring] Senior Site Reliability Engineer @SoFi

SoFi

Remote

USD 120,000 - 160,000

5 days ago
Be an early applicant

Senior Site Reliability Engineer

Akamai Technologies GmbH

Remote

USD 106,000 - 222,000

5 days ago
Be an early applicant

[Hiring] Senior Site Reliability Engineer @K Id

K Id

Remote

USD 100,000 - 140,000

4 days ago
Be an early applicant

Senior Site Reliability Engineer - FinOps

DraftKings

Remote

USD 90,000 - 150,000

5 days ago
Be an early applicant

Senior Site Reliability Engineer, Atlas

MongoDB

Remote

USD 120,000 - 160,000

7 days ago
Be an early applicant