Enable job alerts via email!

Intermediate Site Reliability Engineer, Database Operations

GitLab

Canada

Remote

CAD 80,000 - 110,000

Full time

Today

Be an early applicant

Job summary

A leading software company is seeking an Intermediate Site Reliability Engineer specializing in PostgreSQL to join their Database Operations team. This role involves automating operational tasks, enhancing database reliability, and supporting product teams. Candidates should have hands-on experience with PostgreSQL in high-growth environments and be proficient in automation tools like Chef and Ansible. This fully remote position is open to candidates in Canada.

Qualifications

Experience with high-growth production environments using PostgreSQL.
Hands-on experience using PostgreSQL internals for troubleshooting.
Experience with infrastructure automation tools.

Responsibilities

Automate operational tasks for user-facing services.
Respond to platform emergencies and customer escalations.
Collaborate with product teams to enhance database reliability.

Skills

Experience running PostgreSQL

Infrastructure automation

Understanding of SQL and PL/pgSQL

Excellent written and verbal English communication

Data modeling and data structure design

Tools

Chef

Ansible

Terraform

Intermediate Site Reliability Engineer, Database Operations

GitLab is an open-core software company that develops the AI-powered DevSecOps Platform used by more than 100,000 organizations. Our mission is to enable everyone to contribute to and co-create the software that powers our world. We embrace AI as a core productivity multiplier and expect team members to incorporate AI into daily workflows to drive efficiency, innovation, and impact.

Overview

Site Reliability Engineers (SREs) are responsible for keeping all user-facing services and other GitLab production systems running smoothly. SREs blend pragmatic operations with software craftsmanship, applying engineering principles, operational discipline, and automation to our environments and the GitLab codebase. We specialize in systems, including networking, the Linux kernel, and distributed systems.

The Database Operations team’s mission is to build, run, own and evolve the entire lifecycle of the PostgreSQL database engine for GitLab.com. The team focuses on reliability, scalability, evolution, performance, and security of the database engine and its supporting services. We build services on top of Reliability::Foundations services and cloud vendor managed products where appropriate to reduce complexity and deliver new capabilities faster. GitLab.com is one of the largest single-tenancy open-source SaaS sites on the internet and the knowledge from this team informs other engineering groups and customers running self-managed installations.

Responsibilities

Automating every operational task as a core requirement (e.g., package updates, configuration changes across environments, automatic provisioning tools for user-facing services).
Responding to platform emergencies, alerts, and escalations from Customer Support.
Ensure systems exist to manage software lifecycles (e.g., operating systems) with minimal manual effort.
Develop a fully automated multi-environment observability stack and extend it to predict capacity based on usage patterns.
Plan for new service roll-outs, expansion and capacity management of existing services, and work with users to optimize resource consumption.

As an SRE you will

Work on database reliability and performance for GitLab.com within the SRE team and with product teams.
Analyze solutions and implement best practices for PostgreSQL clusters and components.
Improve observability of database metrics to meet objectives.
Collaborate with peer SREs to roll out changes and mitigate production incidents.
On-call support on rotation.
Provide database expertise to engineering teams (e.g., reviews of migrations, queries and performance optimizations).
Automate database infrastructure and provide self-service tools for engineering.
Use GitLab to run GitLab.com as a first-resort and help improve the product.
Plan the growth of GitLab’s database infrastructure.
Design, build and maintain core database infrastructure components to support high concurrency.
Support and debug production issues across services and stack levels.
Monitor and alert on symptoms rather than outages; document actions for repeatability and automation.

You may be a fit to this role if you

Have primary experience running PostgreSQL in high-growth production environments using self-managed (VM, Kubernetes with PostgreSQL Operators) and DBaaS services.
Have hands-on experience using PostgreSQL internals for design, build and troubleshooting.
Have experience with infrastructure automation and configuration management (Chef, Ansible, Puppet, Terraform).
Have solid understanding of SQL and PL/pgSQL.
Have significant experience in a Large SaaS distributed systems production environment.
Align with our values and collaborate accordingly.
Have excellent written and verbal English communication skills with asynchronous collaboration.
Document everything to enable rapid delivery and iteration.
Proactive, go-for-it attitude; when something is broken, you work to fix it.
Solid data modeling and data structure design skills.
Bonus: Programming skills as a backend engineer (Ruby and/or Go).
Bonus: Experience with ClickHouse or other modern OLAP databases.

Projects you could work on

Review, analyze and implement solutions for database administration (backups, performance tuning).
Build automation with Ansible, Terraform, Chef to automate replicas, testing, and backup monitoring.
Provide self-service tools for engineers using GitLab ChatOps.
Offer technical assistance on database design methodologies and tuning.
Review database migrations and changes from engineering teams.
Recommend query and schema changes to optimize performance.
Respond to production incidents to mitigate database-related issues.
Contribute to infrastructure design and scalability considerations focused on data storage.
Plan steps to scale the database for future needs.
Design and develop specifications for future database requirements, including capacity planning and evaluations of alternatives.

Intermediate Site Reliability Engineer Criteria

Technical

Expertise in at least one area of SRE work, with general knowledge across areas.
Ability to mentor junior team members.
Contributes small improvements to the GitLab codebase to resolve issues.

Execution

Identify projects that yield substantial cost savings or revenue.
Suggest product architecture changes from reliability, performance and availability perspectives using data-driven approaches.
Improve efficiency and capacity planning to reduce resource usage and cost for customers.
Identify parts of the system that do not scale, provide immediate fixes and drive long-term resolution.
Identify SLIs to align the team with availability and latency objectives.

Collaboration and Communication

Thrives in a fully remote, asynchronous environment with emphasis on documentation and written communication.
Develop domain expertise and share knowledge widely.
Participate in blameless RCAs to prevent recurrence of incidents.

Influence and Maturity

Lead junior SREs by example.
Develop ownership of a major part of the infrastructure.
Trusted to de-escalate conflicts within the team.

Performance Indicators

Site Reliability Engineers have the following job-family performance indicators.

Country Hiring Guidelines: GitLab hires globally. All roles are remote, though location-based eligibility may apply. Our Talent Acquisition team can answer questions about location after starting the process.

GitLab is an equal opportunity workplace and an affirmative action employer. Our recruitment and employment practices are merit-based and non-discriminatory. If you require accommodation during the interview process, please let us know.

Apply for this job

indicates a required field

First Name *

Last Name *

Email *

Phone

Country

Phone

Location (City) *

Resume/CV *

Enter manually

Accepted file types: pdf, doc, docx, txt, rtf

LinkedIn Profile

What's the name you prefer during interviews?

Are you subject to any employment agreements or post-employment restrictions?

Country of location if hired

Visa sponsorship now or in the future?

Experience with Postgres at scale?

Experience with Chef or Ansible (or similar)?

Experience with Terraform?

Have you previously worked at or consulted for GitLab?

Equal Employment Opportunity and Accessibility

GitLab is an equal opportunity workplace. If you require accessibility adjustments, please indicate during the interview process. We are committed to an accessible interview experience.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Top cities

Top companies

Popular jobs

Intermediate Site Reliability Engineer, Database Operations

GitLab

Canada

Remote

CAD 80,000 - 110,000