Enable job alerts via email!

Intermediate Site Reliability Engineer, Foundations

GitLab

Canada

Remote

CAD 100,000 - 125,000

Full time

14 days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An established industry player is seeking an Intermediate Site Reliability Engineer to ensure the smooth operation of user-facing services and production systems. This role involves designing scalable networking infrastructure, collaborating with cross-functional teams, and leading initiatives through project management. The ideal candidate will have expertise in Google Cloud Platform, Terraform, and the Kubernetes ecosystem, along with strong programming skills in Ruby or Go. Join a dynamic team that thrives on innovation and automation, tackling unique challenges in a rapidly evolving environment. If you are proactive and enjoy diverse tasks from project work to emergency responses, this opportunity is perfect for you.

Qualifications

  • Expertise in Google Cloud Platform and networking.
  • Experience with Terraform and configuration management tools.

Responsibilities

  • Design and implement scalable networking infrastructure.
  • Respond to incidents on an on-call rotation and participate in reviews.
  • Automate operational tasks to enhance efficiency.

Skills

Google Cloud Platform
Networking (VPCs, subnets, load balancers)
Terraform
Ansible
Chef
Kubernetes
Ruby
Go
Network protocols (TCP/IP, HTTP/HTTPS, DNS)
Scripting (Ruby, Go, Bash)

Tools

GitLab CI

Job description

Intermediate Site Reliability Engineer, Foundations

Remote, Canada

GitLab is an open core software company that develops the most comprehensive AI-powered DevSecOps Platform, used by more than 100,000 organizations. Our mission is to enable everyone to contribute to and co-create the software that powers our world. When everyone can contribute, consumers become contributors, significantly accelerating the rate of human progress. This mission is integral to our culture, influencing how we hire, build products, and lead our industry. We make this possible at GitLab by running our operations on our product and staying aligned with our values.

An overview of this role

GitLab is a complete DevOps platform, delivered as a single application. From project planning and source code management to CI/CD, monitoring, and security, we help teams deliver software faster and more efficiently while strengthening their security and compliance postures.

As an Intermediate Site Reliability Engineer (SRE) at GitLab, you are responsible for keeping all user-facing services and other GitLab production systems running smoothly. SREs are a blend of pragmatic operators and software craftspeople that apply sound engineering principles, operational discipline, and mature automation to our operating environments and the GitLab codebase.

GitLab SREs specialize in systems (operating systems, storage subsystems, networking), while implementing best practices for availability, reliability, and scalability, with varied interests in algorithms and distributed systems.

What you’ll do

  • Design and implement a highly scalable networking infrastructure to support the needs of current and future GitLab platforms and offerings.
  • Collaborate closely with cross-functional teams and other teams throughout Infrastructure-Platforms on projects to drive GitLab’s future.
  • Respond to incidents on an on-call rotation (our team is distributed globally, so you are only on call during your daytime hours!) and participate in incident review.
  • Lead initiatives through problem definition, scoping, design, and project management.
  • Act as subject matter experts within the GitLab Infrastructure-Platforms department, specializing in knowledge of our networking and rate limiting services.
  • Automate every operational task.

What you’ll bring

  • Google Cloud Platform expertise, specifically around networking (VPCs, subnets, load balancers), GKE configuration, and scaling.
  • Experience with Terraform infrastructure as code.
  • Experience with configuration management tools such as Ansible and Chef.
  • Experience with the Kubernetes ecosystem, including Helm.
  • Programming skills and professional experience in Ruby or Go.
  • Understanding of network protocols (TCP/IP, HTTP/HTTPS, DNS).
  • Familiarity with network observability tools and traffic analysis.
  • Comfortable with scripting languages (Ruby, Go, Bash) for automation.
  • Experience with GitLab CI or equivalent.
  • Ability to clearly define problems and think beyond initial solutions, looking at how to make things better in the future.
  • An independent, proactive, and self-organized mindset.
  • Strong ability to clearly communicate asynchronously.
  • Excitement to be doing something different every day from project work to production change requests to emergency response.

About the team

The Production Engineering Foundations team owns the networking infrastructure for GitLab from edge to ingress. Running the largest GitLab instance in existence (and in fact, one of the largest single-tenancy open-source SaaS sites on the Internet) means we are constantly faced with unique and rewarding challenges that directly impact our users every day. Our future is all about increasing automation and enabling other teams by building paved roads for things like rate limiting and edge networks, so we can continue to scale even bigger with enterprise-level expectations around reliability and availability.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Intermediate Site Reliability Engineer, Foundations

GitLab

Remote

USD 103,000 - 222,000

9 days ago

Site Reliability Engineer

Blink AI

Remote

CAD 70,000 - 110,000

Today
Be an early applicant

Site Reliability Engineer

Insight Global

Remote

CAD 100,000 - 125,000

6 days ago
Be an early applicant

Staff Infrastructure Site Reliability Engineer

Remoteworldwide

Remote

CAD 90,000 - 150,000

2 days ago
Be an early applicant

Site Reliability Engineer

Dayforce

Remote

CAD 70,000 - 110,000

2 days ago
Be an early applicant

Software Engineer, Site Reliability (Senior or Staff)

BioRender

Remote

CAD 80,000 - 150,000

6 days ago
Be an early applicant

Site Reliability Engineer

Foundant Technologies

Remote

CAD 80,000 - 110,000

9 days ago

Software Platform Engineering Manager - Ubuntu for Next-Gen Silicon

Canonical

Moncton

Remote

USD 90,000 - 150,000

8 days ago

Software Platform Engineering Manager - Ubuntu for Next-Gen Silicon

Canonical

Regina

Remote

USD 90,000 - 150,000

9 days ago