Enable job alerts via email!

Intermediate Site Reliability Engineer, Observability

Tbwa Chiat/Day Inc

Canada

Remote

CAD 100,000 - 125,000

Full time

30+ days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An innovative company is seeking an Intermediate Site Reliability Engineer to enhance observability within their SaaS environments. This role involves building and maintaining observability tools, developing monitoring systems, and collaborating with engineering teams to optimize resource consumption. The ideal candidate will have experience with Infrastructure as Code technologies, large systems reasoning, and a passion for teamwork. Join a forward-thinking organization that values contributions from all team members and fosters an inclusive culture where everyone can thrive.

Qualifications

  • Experience with Infrastructure as Code technologies and libraries powering GitLab.
  • Ability to reason about large systems and their operational behaviors.

Responsibilities

  • Build and maintain observability tools and systems for GitLab's SaaS environments.
  • Collaborate with engineering teams to resolve architectural bottlenecks.

Skills

Infrastructure as Code
Large Systems Reasoning
Collaboration

Tools

Grafana
ELK Stack
Ansible
Terraform
Kubernetes
AWS
GCP

Job description

Intermediate Site Reliability Engineer, Observability

Remote, Canada

GitLab is an open core software company that develops the most comprehensive AI-powered DevSecOps Platform, used by more than 100,000 organizations. Our mission is to enable everyone to contribute to and co-create the software that powers our world. When everyone can contribute, consumers become contributors, significantly accelerating the rate of human progress. This mission is integral to our culture, influencing how we hire, build products, and lead our industry. We make this possible at GitLab by running our operations on our product and staying aligned with our values.

Site Reliability Engineers (SREs) are responsible for keeping all user-facing services and other GitLab production systems running smoothly. SREs are a blend of pragmatic operators and software craftspeople that apply sound engineering principles, operational discipline, and mature automation to our environments and the GitLab codebase.

The Observability Team's mission is to Build, Run and Own the entire lifecycle of the suite of services that enable observability of the GitLab SaaS environments. These services allow Infrastructure, Development and Product teams to observe how their code runs on GitLab’s SaaS Platforms and contribute to our overall reliability and scalability goals.

As an SRE you will:
  • Build: Take a Platform-first approach to solving problems. Our Observability stack needs to be extended to support our growth and we need engineers to focus on how to build solutions that enable the whole organization to scale.
  • Maintain: Our metrics environment as well as the tools and processes we have developed to provide this information throughout the company.
  • Plan: Develop monitoring and alerting systems that predict capacity needs based on the customer usage patterns. Plan for new service rollouts, expansion of existing services and preparing advice for customers to optimize their resource consumption.
  • Respond: There is a requirement to be part of an on-call rota in this role.
  • Partner: Act as Subject Matter Expert for metrics gathering, observability guidelines, and capacity planning.
  • Collaborate: Work with other engineering stakeholders on resolving larger architectural bottlenecks and participate by offering a large scale operational point of view. Work in close collaboration with software development teams.
You may be a fit for this role if you:
  • Have experience with Infrastructure as a Code technologies, and libraries powering GitLab.
  • Have experience with Grafana’s LGTM stack, or Elastic’s stack (ELK).
  • Are able to reason about large systems - how they work and can be operated on a large scale, edge cases, failure modes, behaviors.
  • Enjoy working with peers and collaborating across teams to deliver unique solutions to various technical challenges.
  • Are able to leverage GitLab as your day-to-day go-to tool.

You share our values, and work in accordance with those values.

Projects you could work on:
  • Work on the GitLab core projects such as, GitLab Rails, GitLab Workhorse, Gitaly, etc.
  • Coding infrastructure automation with Ansible and Terraform, and comfortable with managed Kubernetes platforms.
  • Work on the GitLab observability stack (e.g. ELK, Prometheus, Grafana).
  • Interact with various cloud provider systems (e.g. GCP, AWS).

Please note that we welcome interest from candidates with varying levels of experience; many successful candidates do not meet every single requirement. If you're excited about this role, please apply and allow our recruiters to assess your application.

GitLab is proud to be an equal opportunity workplace and is an affirmative action employer. GitLab’s policies and practices relating to recruitment, employment, career development and advancement, promotion, and retirement are based solely on merit, regardless of race, color, religion, ancestry, sex (including pregnancy, lactation, sexual orientation, gender identity, or gender expression), national origin, age, citizenship, marital status, mental or physical disability, genetic information (including family medical history), discharge status from the military, protected veteran status, or any other basis protected by law. GitLab will not tolerate discrimination or harassment based on any of these characteristics.

Apply for this job

* indicates a required field

First Name *

Last Name *

Email *

Phone

Location (City) *

Resume/CV *

LinkedIn Profile

What's the name you'd prefer us to use throughout the interview process?

Are you subject to any employment agreements and/or post-employment restrictions with your current employer or a past employer? * Select...

It is important to us to create an accessible and inclusive interview experience. Please let us know if there are any adjustments we can make to assist you during the hiring and interview process.

Will you now or in the future require sponsorship for a visa to remain in your current location? * Select...

Are you located in Canada? * Select...

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Site Reliability Engineer

Wave Mobile Money

Ontario

Remote

USD 100,000 - 153,000

Today
Be an early applicant

Site Reliability Engineer (SRE) - Platform Infrastructure team (100% Remote - Canada)

Hopper

Toronto

Remote

CAD 100,000 - 130,000

Today
Be an early applicant

Site Reliability Engineer

坡癥

Montreal

Remote

USD 120,000 - 153,000

Today
Be an early applicant

Intermediate Site Reliability Engineer, Foundations

GitLab

Remote

USD 103,000 - 222,000

16 days ago

Software Engineer, Site Reliability (Senior or Staff)

BioRender

Remote

CAD 80,000 - 150,000

14 days ago

Intermediate Site Reliability Engineer, Observability

Applied Plastering Inc

Remote

CAD 100,000 - 125,000

30+ days ago

Senior Site Reliability Engineer - (Remote - Canada)

Jobgether

Remote

CAD 80,000 - 120,000

27 days ago

Staff Infrastructure Site Reliability Engineer

Remoteworldwide

Remote

CAD 90,000 - 150,000

10 days ago

Senior Site Reliability Engineer

Coalition Inc

Remote

CAD 90,000 - 130,000

6 days ago
Be an early applicant