Ativa os alertas de emprego por e-mail!

Senior Site Reliability Engineer (SRE)

dLocal

Brasil

Teletrabalho

BRL 120.000 - 160.000

Tempo integral

Ontem

Torna-te num dos primeiros candidatos

Melhora as tuas possibilidades de ir a entrevistas

Cria um currículo adaptado à oferta de emprego para teres uma taxa de sucesso superior.

Resumo da oferta

Join a leading fintech company as a Senior Site Reliability Engineer (SRE) where you will design and maintain observability systems for major clients like Netflix and Amazon. Work in a flexible, remote-first environment with a dynamic global team. You will utilize your expertise in Kubernetes and observability tools to enhance system reliability and performance while collaborating across teams to meet monitoring and alerting requirements.

Serviços

Remote work

Flexible schedules

Referral bonus program

Learning & development

Language classes

Social budget

dLocal Houses

Qualificações

Over 4 years’ experience as SRE Engineer or similar role focused on observability.
Expertise in Kubernetes and monitoring best practices.
Strong scripting abilities for automating observability tasks.

Responsabilidades

Design and maintain observability pipelines using OpenTelemetry.
Build self-service automation for development teams.
Support incident management and design processes for incidents.

Conhecimentos

Kubernetes

OpenTelemetry

Python

Problem Solving

Ferramentas

Grafana

Prometheus

Loki

New Relic

Datadog

Terraform

ArgoCD

GitHub Actions

Join to apply for the Senior Site Reliability Engineer (SRE) role at dLocal

1 month ago Be among the first 25 applicants

Get AI-powered advice on this job and more exclusive features.

Why should you join dLocal?

dLocal enables the biggest companies in the world to collect payments in 40 countries in emerging markets. Global brands rely on us to increase conversion rates and simplify payment expansion effortlessly. As both a payments processor and a merchant of record where we operate, we make it possible for our merchants to make inroads into the world’s fastest-growing, emerging markets.

By joining us you will be a part of an amazing global team that makes it all happen, in a flexible, remote-first dynamic culture with travel, health and learning benefits, among others. Being a part of dLocal means working with 1000+ teammates from 30+ different nationalities and developing an international career that impacts millions of people’s daily lives. We are builders, we never run from a challenge, we are customer-centric, and if this sounds like you, we know you will thrive in our team.

What's the opportunity?

We are looking for a Site Reliability Engineer (SRE) to join our team! As our Site Reliability Engineer (SRE), you will be focused on the design, implementation and continuous maintenance of our centralized observability platform using OpenTelemetry (OTEL) as its backend. You will be part of a talented team that works on mission-critical applications with big customers like Netflix, Amazon, Nike, Facebook & more!

As a Site Reliability Engineer, you are always expected to ask the necessary questions:

What data do we need to understand how our systems are performing?
How do we collect this data?
What patterns are we looking for in the data and what do they mean?
Who should be notified when a certain system is not working properly?
Do we have any systems that we need more data for?

An SRE engineer designs systems and processes to answer the questions above and to provide automated support and response where possible.

What will you do?

Own OpenTelemetry Pipelines: Design, implement, and maintain observability pipelines across the three main signals—logs, metrics, and traces—ensuring standardized, scalable, and efficient data ingestion. Optimize ingestion strategies to balance cost, performance, and usability
Empower Engineering Teams: Build self-service automation and tooling that enables development teams to instrument and leverage observability without requiring manual intervention from the SRE team. Drive adoption of best practices while ensuring teams own their telemetry
Support Incident Management: Be the Engineering side of our Incident Management Team, designing the processes, playbooks, checklists, and automations for them and other engineers to follow during an incident
Collaborate Across Teams: Interact with members from almost all teams across the business to understand their monitoring, alerting and SLO / SLA requirements and design systems and processes that ensure we meet or exceed these requirements. Influence architectural decisions during initial design stages to ensure resiliency and scale at the outset of software development
Automate Observability Infrastructure: Leverage Infrastructure-as-Code (IaC) to provision and manage monitoring tools, alerting rules, and our observability configurations across OTEL Pipelines
Define Baseline Observability Standards: Design base level requirements for new and existing services to ensure that all dLocal infrastructure and code are monitored consistently and accurately at a basic level
Own Technical and Security Health: Take full ownership of dLocal’s infrastructure reliability, ensuring adherence to key availability and security KPIs
Optimize Alerting Systems: Continuously refine alerting signals to minimize noise and ensure them are always actionable, reducing fatigue and improving response efficiency

Which skill do you need?

Over 4 years’ of experience as SRE Engineer or in a very similar role more focused on observability
Expertise in Kubernetes, including its core components, deployment methodologies, and monitoring best practices
Some understanding of OpenTelemetry, including setting up OTEL collectors, instrumentation, and pipeline optimization
Proficiency with monitoring and logging tools such as Grafana, Prometheus, Loki, New Relic, or Datadog
Hands-on experience with IaC tools (Terraform) and GitOps CI/CD solutions (ArgoCD, GitHub Actions, or similar)
Experience integrating incident management platforms (PagerDuty, Jira) with automated alerting workflows
Strong scripting abilities (Python, Go, or similar) for automating observability tasks
A problem-solving mindset, with the ability to collaborate across multi-functional teams to drive reliability improvements

You will stand out if you have:

Cloud experience, especially AWS and ECS-based workloads
Experience managing observability pipelines at scale in high-throughput environments
Familiarity with Configuration-as-Code (Ansible, Chef, or SaltStack) for managing configurations across legacy instances
Database performance monitoring experience, particularly in large-scale distributed environments

What do we offer?

Besides the tailored benefits we have for each country, dLocal will help you thrive and go that extra mile by offering you:

Remote work: work from anywhere or one of our offices around the globe!*
Flexibility: we have flexible schedules and we are driven by performance
Fintech industry: work in a dynamic and ever-evolving environment, with plenty to build and boost your creativity
Referral bonus program: our internal talents are the best recruiters - refer someone ideal for a role and get rewarded
Learning & development: get access to a Premium Coursera subscription
Language classes: we provide free English, Spanish, or Portuguese classes
Social budget: you'll get a monthly budget to chill out with your team (in person or remotely) and deepen your connections!
dLocal Houses: want to rent a house to spend one week anywhere in the world coworking with your team? We’ve got your back!
For people based in Montevideo (Uruguay) applying to non-IT roles, 55% monthly attendance to the office is required

What happens after you apply?

Our Talent Acquisition team is invested in creating the best candidate experience possible, so don’t worry, you will definitely hear from us. We will review your CV and keep you posted by email at every step of the process!

Also, you can check out our webpage, Linkedin, Instagram, and Youtube for more about dLocal!

Seniority level

Not Applicable

Employment type

Full-time

Job function

Engineering and Information Technology

Obtém a tua avaliação gratuita e confidencial do currículo.

ou arrasta um ficheiro em formato PDF, DOC, DOCX, ODT ou PAGES até 5 MB.

Ofertas semelhantes

[BM] SR Site Reliability Engineer (SRE)

Zallpy Digital

Teletrabalho

BRL 120,000 - 160,000

Hoje

Torna-te num dos primeiros candidatos

Site Reliability Engineer (Senior/Lead) ID35136

AgileEngine

Rio de Janeiro

Teletrabalho

BRL 120,000 - 160,000

Hoje

Torna-te num dos primeiros candidatos

Site Reliability Engineer (Senior/Lead) ID35136

AgileEngine

Curitiba

Teletrabalho

USD 120,000 - 150,000

Há 2 dias

Torna-te num dos primeiros candidatos

Site Reliability Engineer (Senior/Lead) ID35136

AgileEngine

Brasília

Teletrabalho

USD 120,000 - 150,000

Hoje

Torna-te num dos primeiros candidatos

Site Reliability Engineer (Senior/Lead) ID35136

AgileEngine

Buenos Aires

Teletrabalho

USD 100,000 - 130,000

Hoje

Torna-te num dos primeiros candidatos

Site Reliability Engineer (Senior/Lead) ID35136

AgileEngine

Belo Horizonte

Teletrabalho

USD 120,000 - 150,000

Há 2 dias

Torna-te num dos primeiros candidatos