Job Search and Career Advice Platform

Ativa os alertas de emprego por e-mail!

Grupo QuintoAndar | Senior Site Reliability Engineer (Observability)

Grupo QuintoAndar

Brasil

Presencial

BRL 120.000 - 160.000

Tempo integral

Há 2 dias
Torna-te num dos primeiros candidatos

Cria um currículo personalizado em poucos minutos

Consegue uma entrevista e ganha mais. Sabe mais

Resumo da oferta

A leading real estate technology firm in Brazil is looking for a Senior Site Reliability Engineer specialized in Observability. The role involves maintaining cloud infrastructure, supporting observability practices, and ensuring scalability and reliability across services. The ideal candidate should have solid experience with observability tooling, container orchestration, and automation of workflows. Join a high-performance team in a remote-first environment aimed at innovating the housing journey.

Serviços

Profit sharing
Health insurance
Employee assistance program
Extended parental leave

Qualificações

  • Solid experience provisioning and maintaining cloud infrastructure.
  • Hands-on experience with observability tooling (metrics, logs).
  • Ability to identify and fix performance and reliability issues.

Responsabilidades

  • Provision and maintain cloud infrastructure.
  • Automate workflows and improve CI/CD pipelines.
  • Support and expand the observability platform.

Conhecimentos

Observability practices
OpenTelemetry infrastructure
Monitoring tools (Prometheus, Grafana)
Kubernetes
Python
Infrastructure as code

Ferramentas

Prometheus
Grafana
OpenTelemetry
Terraform
Descrição da oferta de emprego
Grupo QuintoAndar | Senior Site Reliability Engineer (Observability)

About Grupo QuintoAndar

We are Grupo QuintoAndar, the largest real estate ecosystem in Latin America. Guided by a shared purpose of helping people love where they live, we have a diversified portfolio of brands and solutions across different countries in Latin America, covering all phases of the housing journey. We also have a Technology Hub in Portugal. We develop technology and innovation to transform and enhance the overall living experience.

With the support of a world‑class team of investors and advisors, including Kaszek, Qualcomm, General Atlantic, and SoftBank, Grupo QuintoAndar is currently valued at over USD 5.1 billion and continues to grow year over year.

Here, you will work with top professionals in the market, in an environment that breathes innovation, collaboration, and high performance. To learn more about our story, visit: https://grupoquintoandar.com/pt/ .

Location & Remote Work

Our technology team operates under a "remote‑first" model, which means we work from home and can live anywhere in Brazil. We also offer the option of working from our São Paulo offices or partner coworking spaces, up to twice a week.

Hiring Process Stages

  • People interview
  • Tech Interview 2 | Debugging Interview

About the Team

As a Site Reliability Engineer focused on Observability, you will help build and maintain our cloud infrastructure while enabling teams to better understand and operate their systems.

You’ll work closely with product engineering teams to ensure services are observable, scalable, secure, and resilient.

Your activities will include provisioning cloud infrastructure, evolving our observability stack (metrics, logs, and traces), defining and maintaining SLIs/SLOs, automating workflows, improving CI/CD pipelines, identifying and correcting performance issues, and developing tools that enhance the daily experience of our engineers.

We are strong adopters of OpenTelemetry and continuously evolving our instrumentation strategy across all services.

Picture an observability platform that we own–no Datadog training wheels, no Dynatrace magic carpets. It’s built on OpenTelemetry from collector to UI, so every metric, log, trace (and any future shiny signal) is ours to shape. Your job is to keep that beast humming:

  • Provision and tune the cloud plumbing that powers the platform.
  • Grow QuintoAndar Observability—All telemetry, for All services, in All environments, All the time, available to All engineers.
  • Define and guard the SLIs/SLOs that tell us when reality drifts from "supposed to."
  • Automate anything that moves twice (workflows, dashboards, data retrieval, you name it).
  • Hunt down performance gremlins with the help of the rest of the engineering before they nibble production.
  • Build tools that make every engineer’s day 10 % less painful—and brag-worthy.

TL;DR: You’ll be the custodian of a home‑grown, company‑wide observability stack, wiring it, scaling it, and making sure it never blinks. If that sounds fun, bring your cape.

Some real use cases we've worked on:

  • Observability & Incident Analysis
  • Evolved our observability platform using Prometheus, Thanos, Grafana, Loki, Tempo, and Faro over a full OpenTelemetry stack to provide deep visibility across our systems;
  • Built data pipelines to analyze incident metrics, helping us reduce MTTR and understand patterns across environments;
  • Worked alongside engineering teams to define and monitor (and fight for) SLIs/SLOs for key APIs, improving reliability and customer experience;
  • Led workshops and internal sessions on instrumentation and observability best practices using OpenTelemetry and our observability tools;
  • Improve our internal observability infrastructure to reduce costs, latency and downtime.
  • Platform Engineering & Developer Experience
  • Created custom Kubernetes operators in Golang to automate infrastructure lifecycle and reduce manual interventions;
  • Built our internal CLI (QLI) to help developers manage resources, debug environments, and access observability data more easily;
  • Migrated our continuous delivery platform to GitOps without disrupting workflows—supporting over 300 daily production deployments;
  • Security & Infrastructure
  • Designed a centralized solution for services to connect to databases using temporary credentials, improving security posture;
  • Segmented AWS accounts to provide better cost visibility, access control, and separation of concerns across teams;
  • Developed tools that enhance both security and observability without creating friction for engineers;
  • Ensure best security practices for our open‑source tools.
  • Partnered with developers to investigate complex, production‑level issues using logs, metrics, and distributed tracing;
  • Supported teams in onboarding to our Kubernetes environment, ensuring applications are properly monitored and alerting is in place from day one.

Requirements

You will:

  • Provision and maintain our cloud infrastructure;
  • Identify and fix performance and reliability issues;
  • Operate and evolve our Kubernetes clusters;
  • Build tools that improve engineering workflows and visibility;
  • Support and expand our observability platform with metrics, logs, traces and profiling.

What we are looking for:

  • Solid experience with observability practices and tooling (metrics, logs, and traces);
  • Hands‑on experience with OpenTelemetry infrastructure and instrumentation;
  • Familiarity with monitoring tools like Prometheus, Grafana, Loki, or similar;
  • Ability to define and maintain SLIs/SLOs aligned with product and engineering goals;
  • Experience with container orchestration platforms (Kubernetes, ECS);
  • Understanding of CI/CD workflows and delivery automation;
  • Proficiency in at least one programming language (we primarily use Python and Golang);
  • Knowledge of infrastructure as code tools (Terraform, Crossplane, and/or Pulumi).

You will stand out if you have:

  • Knowledge of microservice architecture and distributed systems;
  • Additional experience with GitOps, Kafka, CDN, Gateway APIs, or similar tools.
  • Knowledge in JVM‑based programming languages

Important

  • Our hiring process starts with the application! If you truly want to be part of our team, please complete this step of the process. We analyze all candidates individually and provide feedback to all applicants.
  • All communication will be conducted via email, so please stay tuned for our messages and release the domain @quintoandar.com.br to ensure our emails are not sent to spam.
  • Profit sharing
  • Health insurance
  • Life insurance
  • Childcare subsidy and Atypical Parenthood subsidy
  • Home office allowance
  • Employee assistance program (mental health, social, legal, and financial support)
  • Extended parental leave
  • Day off on birthday, Mother’s Day, and Father’s Day
  • Benefits Club (discounts on everyday services)
  • Discounts at educational institutions

Diversity & Inclusion at Grupo QuintoAndar

We value diversity and want everyone to feel welcome here, regardless of their age, gender identity, sexual orientation, race, color, ethnicity, origin, disability, religion, or any other characteristic. All our job openings are open to all individuals!

You’ll notice there are some diversity questions in the application form. For affirmative action roles, this information may be used to verify your alignment with the target audience for the opportunity. In such cases, it may be used for elimination purposes. For non‑affirmative action roles, this data will be used anonymously, exclusively to monitor and improve our inclusion practices in the hiring process, and will have no impact on your application.

Privacy and Data Protection

The Grupo QuintoAndar operates in compliance with privacy and data protection laws, including, but not limited to, the Brazilian General Personal Data Protection Law (LGPD) (Law No. 13,709/2018), and ensures the security of your data. To learn more, please access our Privacy Notice for Candidates . For questions or to exercise your rights as a data subject, please contact us through our Service Channel .

Obtém a tua avaliação gratuita e confidencial do currículo.
ou arrasta um ficheiro em formato PDF, DOC, DOCX, ODT ou PAGES até 5 MB.