Enable job alerts via email!

Senior Site Reliability Engineer, Database Operations:Clickhouse

GitLab Inc.

New York (NY)

Remote

USD 117,000 - 252,000

Full time

30+ days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

Join a forward-thinking company as a Senior Site Reliability Engineer specializing in database operations. In this exciting role, you will design, build, and maintain high-availability ClickHouse and PostgreSQL clusters, ensuring the reliability of critical services. Collaborate with a globally distributed team to optimize cloud infrastructure and enhance security measures. This innovative firm values contributions from all team members, fostering an environment where your expertise will directly impact the success of their operations. If you're passionate about database management and cloud automation, this opportunity is perfect for you.

Qualifications

  • Advanced experience in managing PostgreSQL and ClickHouse databases at scale.
  • Strong skills in cloud infrastructure automation and incident management.

Responsibilities

  • Design and maintain ClickHouse and PostgreSQL clusters for enterprise workloads.
  • Implement monitoring tools to ensure system reliability and performance.

Skills

Database Management
Cloud Infrastructure Automation
Programming (Go, Ruby, Python)
Linux Expertise
Incident Management
Monitoring Implementation
Communication Skills

Tools

PostgreSQL
ClickHouse
Ansible
Chef
Terraform
Kubernetes
Prometheus
Grafana

Job description

Senior Site Reliability Engineer, Database Operations: Clickhouse

Remote

GitLab is an open core software company that develops the most comprehensive AI-powered DevSecOps Platform, used by more than 100,000 organizations. Our mission is to enable everyone to contribute to and co-create the software that powers our world. When everyone can contribute, consumers become contributors, significantly accelerating the rate of human progress. This mission is integral to our culture, influencing how we hire, build products, and lead our industry. We make this possible at GitLab by running our operations on our product and staying aligned with our values.

Role Overview

Site Reliability Engineers (SREs) are responsible for keeping all user-facing services and other GitLab production systems running smoothly 24x7x365. SREs are a blend of pragmatic operators and software craftspeople that apply sound engineering principles, operational discipline, and mature automation to our environments and the GitLab codebase. We specialize in systems, whether it be networking, the Linux kernel, or some more specific interest in scaling, algorithms, or distributed systems.

Key Responsibilities
  • Design, build, and maintain ClickHouse and PostgreSQL clusters to support high-demand, enterprise-scale workloads.
  • Provision and orchestrate cloud infrastructure using configuration management tools (Ansible, Chef), IaC (Terraform), and the Kubernetes ecosystem (Helm charts, Operators) in GCP.
  • Design and implement enterprise-grade, high-availability ClickHouse solutions with ClickHouse Keeper, sharding, and replication, optimized for large-scale and dynamic datasets.
  • Optimize and scale high-transaction PostgreSQL clusters with Patroni and streaming replication for GitLab’s core applications on GCP.
  • Build and maintain early warning systems, monitoring, and alerting tools (e.g., Prometheus/Grafana) to predict capacity needs, monitor query latency and replication lag, and ensure resource optimization across platforms.
  • Enable cross-database integrations and workflows, such as ClickHouse-to-PostgreSQL data federation, CDC, and logical replication.
  • Respond to platform alerts, user emergencies, and support requests while ensuring strict adherence to SLOs, including during SRE on-call rotations.
  • Enhance infrastructure security by implementing and updating measures that protect GitLab’s systems and ensure compliance with regulatory requirements (e.g., GDPR, FedRAMP, SOC2, ISO).
  • Partner with internal and external compliance assessors as Subject Matter Experts during certifications and recertifications.
  • Collaborate with engineering teams to address architectural bottlenecks, plan service rollouts and migrations, and shape the future roadmap while maintaining strong operational readiness.
Team Overview

The Database Operations Engineer at GitLab is responsible for the building, running, owning, and evolving of the entire lifecycle of database engines for GitLab.com. We are a globally distributed team of five, spanning EMEA, APAC, and the Americas, with plans to grow further. Our mission is to ensure the reliability and continuous evolution of the database infrastructure behind GitLab.com and beyond.

Mandatory Technical Skills and Experience
  • Advanced database platform management experience, preferably using Postgres and Clickhouse at scale.
  • Advanced Cloud Infrastructure automation and management, preferably using Ansible, Chef, Terraform, Helm charts, Operators, and Kubernetes.
  • Solid experience with at least one programming language: Go, Ruby, or Python.
  • Advanced experience with Linux.
  • Extensive on-call experience as an SRE supporting mission-critical systems.
  • Solid incident management experience, across all phases: Analysis, Remediation, RCA, and Corrective Actions.
  • Solid experience implementing monitoring at scale (preferably Prometheus and Grafana).
Mandatory Non-Technical Skills and Characteristics
  • Willingness and ability to live and promote GitLab's unique CREDIT Values in one's day-to-day work and interactions with teammates.
  • Superior verbal and written communication skills.
  • Cool, collected, and composed under pressure.
  • Comfortable and productive working asynchronously across time zones and cultures, at the speed and scale of business.
  • Enable others to excel.
  • Be a Leader of One.
  • Act Like an Owner with GitLab's resources.

Please note that we welcome interest from candidates with varying levels of experience; many successful candidates do not meet every single requirement. If you're excited about this role, please apply and allow our recruiters to assess your application.

Salary Range

The base salary range for this role’s listed level is currently for residents of listed locations only. Grade level and salary ranges are determined through interviews and a review of education, experience, knowledge, skills, abilities of the applicant, equity with other team members, and alignment with market data.

California/Colorado/Hawaii/New Jersey/New York/Washington/DC/Illinois/Minnesota pay range: $117,600 - $252,000 USD.

Country Hiring Guidelines

GitLab hires new team members in countries around the world. All of our roles are remote; however, some roles may carry specific location-based eligibility requirements. Our Talent Acquisition team can help answer any questions about location after starting the recruiting process.

Equal Opportunity Employer

GitLab is proud to be an equal opportunity workplace and is an affirmative action employer. GitLab’s policies and practices relating to recruitment, employment, career development and advancement, promotion, and retirement are based solely on merit, regardless of race, color, religion, ancestry, sex (including pregnancy, lactation, sexual orientation, gender identity or gender expression), national origin, age, citizenship, marital status, mental or physical disability, genetic information, discharge status from the military, protected veteran status, or any other basis protected by law.

Apply for this Job

To apply, please provide the following information:

  • First Name *
  • Last Name *
  • Email *
  • Phone
  • Location (City) *
  • Resume/CV *
  • LinkedIn Profile
  • Will you now or in the future require sponsorship for a visa to remain in your current location? *
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Senior Site Reliability Engineer

Optimism

New York

Remote

USD 120,000 - 160,000

Today
Be an early applicant

Senior Back End Engineer, Platform New York (Remote)

You.ai

New York

Remote

USD 150,000 - 270,000

5 days ago
Be an early applicant

Senior Site Reliability Engineer - 2289298

Optum

Eden Prairie

Remote

USD 103,000 - 192,000

-1 days ago
Be an early applicant

Senior Site Reliability Engineer - 2289298

UnitedHealth Group

Eden Prairie

Remote

USD 103,000 - 192,000

-1 days ago
Be an early applicant

Senior Site Reliability Engineer

Slope

Palo Alto

Remote

USD 170,000 - 210,000

-1 days ago
Be an early applicant

Senior Site Reliability Engineer

Owner

Remote

USD 170,000 - 210,000

Today
Be an early applicant

Senior Site Reliability Engineer

Exabeam

Remote

USD 90,000 - 150,000

Yesterday
Be an early applicant

Senior Site Reliability Engineer

LeoLabs, Inc.

Remote

USD 163,000 - 217,000

Yesterday
Be an early applicant

Senior Site Reliability Engineer - Data (REMOTE)

Discogs

Seattle

Remote

USD 130,000 - 140,000

2 days ago
Be an early applicant