Ativa os alertas de emprego por e-mail!

Staff Site Reliability Engineer - Work from home

Nearsure

Rio de Janeiro

Teletrabalho

USD 80.000 - 120.000

Tempo integral

Ontem
Torna-te num dos primeiros candidatos

Melhora as tuas possibilidades de ir a entrevistas

Cria um currículo adaptado à oferta de emprego para teres uma taxa de sucesso superior.

Resumo da oferta

Nearsure is looking for a Staff Site Reliability Engineer to join their remote team, focusing on optimizing observability and infrastructure reliability. In this role, you'll design and implement observability pipelines, automate processes for teams, and ensure effective monitoring across services. The ideal candidate will have substantial experience in cloud technologies, strong scripting skills, and a dedication to best practices in systems reliability.

Serviços

Competitive USD salary
100% remote work
Paid time off
National Holidays celebrated
Sick leave
Refundable Annual Credit
Team-building activities
Birthday day off

Qualificações

  • 8+ years of experience as an SRE Engineer, focused on observability.
  • 5+ years of cloud experience, particularly with AWS.
  • Strong scripting abilities in Python or Go.

Responsabilidades

  • Design and maintain observability pipelines with optimized ingestion strategies.
  • Build self-service automation tools for development teams.
  • Ensure client infrastructure reliability and adherence to security KPIs.

Conhecimentos

Automation
Observability
Scripting (Python, Go)
Cloud (AWS)
Infrastructure as Code (Terraform)
Kubernetes
Monitoring Tools (Grafana, Prometheus)

Formação académica

Bachelor's Degree in Computer Science, Engineering, or a related field

Ferramentas

Terraform
GitOps (ArgoCD, GitHub Actions)

Descrição da oferta de emprego

Staff Site Reliability Engineer - Work from home
Staff Site Reliability Engineer - Work from home

3 days ago Be among the first 25 applicants

Join our close-knit LATAM remote team: Connect through fun activities like coffee breaks, tech talks, and games with your team-mates and management.

Say goodbye to micromanagement! We champion autonomy, open communication, and respect for diversity as our core values.

️Your well-being matters: Our People Care team is here from day one to support you with everything from time-off requests to wellness check-ins.

Plus, our Accounts Management team ensures smooth, effective client relationships, so you can focus on what you do best.

Ready to grow with us?

Here’s what we offer you by joining us!

Competitive USD salary – We value your skills and contributions!

100% remote work – While you can work from anywhere, you’re always welcome to connect with teammates and grow your network at our coworking spaces across LATAM!

Paid time off – Take the time you need according to your country’s regulations, all while receiving your full salary. Rest, recharge, and come back stronger!

National Holidays celebrated – Take time off to celebrate important events and traditions with loved ones, fully embracing your culture.

Sick leave – Focus on your health without the stress. Take the necessary time to recover and feel better.

Refundable Annual Credit – Spend it on the perks you love to enhance your work-life balance!

Team-building activities – Join us for coffee breaks, tech talks, and after-work gatherings to bond with your Nearsure family and feel part of our vibrant community.

Birthday day off – Enjoy an extra day off during your birthday week to celebrate in style with friends and family!

About the project:

As a Staff Site Reliability Engineer, you will own and optimize OpenTelemetry pipelines, enabling scalable and efficient observability. You’ll build tools that empower teams, support incident response, and drive best practices. Your work ensures a reliable, secure infrastructure and actionable alerting across the organization.

How your day-to-day work will look like

Design, implement, and maintain observability pipelines across the three main signals—logs, metrics, and traces—ensuring standardized, scalable, and efficient data ingestion. Optimize ingestion strategies to balance cost, performance, and usability.

Build self-service automation and tooling that enables development teams to instrument and leverage observability without requiring manual intervention from the SRE team. Drive adoption of best practices while ensuring teams own their telemetry.

Design the processes, playbooks, checklists, and automations for them and other engineers to follow during an incident.

Interact with members from almost all teams across the business to understand their monitoring, alerting, and SLO / SLA requirements and design systems and processes that ensure we meet or exceed these requirements. Influence architectural decisions during initial design stages to ensure resiliency and scale at the outset of software development.

Design the processes, playbooks, checklists, and automations for them and other engineers to follow during an incident.

Leverage Infrastructure-as-Code (IaC) to provision and manage monitoring tools, alerting rules, and our observability configurations across OTEL Pipelines.

Design base-level requirements for new and existing services to ensure that all client infrastructure and code are monitored consistently and accurately at a basic level.

Take full ownership of client infrastructure reliability, ensuring adherence to key availability and security KPIs.

This would make you the ideal candidate

Bachelor's Degree in Computer Science, Engineering, or a related field.

8+ Years of experience working as an SRE Engineer or in a very similar role, more focused on observability.

5+ Years of experience working with cloud (AWS).

5+ Years of experience working with IaC tools (Terraform) and GitOps CI/CD solutions (ArgoCD, GitHub Actions, or similar).

4+ Years of experience working with monitoring and logging OpenSource tools such as Grafana, Prometheus, Elastic/OpenSearch, Loki, Tempo.

4+ Years of experience working in Kubernetes, including its core components, deployment methodologies, and monitoring best practices.

Strong scripting abilities (Python, Go, or similar) for automating observability tasks.

Experience in managing observability: SLI, SLOs, Log Transformation, Cardinality Management, Business and Resilience Metrics, 4 Golden Signals, Distributed Tracing.

Experience with automated alerting workflows.

Exposure with OpenTelemetry Pipelines.

Advanced English Level is required for this role as you will work with US clients. Effective communication in English is essential to deliver the best solutions to our clients and expand your horizons.

What to expect from our hiring process

1. Let’s chat about your experience!

2. Impress our recruiters, and you’ll move on to a technical interview with our top developers.

3. Nail that, and you’ll meet our client - your final step to joining our amazing team!

At Nearsure, we’re dedicated to solving complex business challenges through cutting-edge technology and we believe in the power of tailored solutions. Whether you are passionate about transforming businesses with Generative AI, building innovative software products, or implementing comprehensive enterprise platform solutions, we invite you to be part of our dynamic team!

We would love to hear from you if you are eager to make an impact and join a collaborative team that values creativity and expertise.

Let’s work together to shape the future of technology!

By applying to this position, you authorize Nearsure to collect, store, transfer, and process your personal data in accordance with our Privacy Policy. For more information, please review our Privacy Policy. (https://www.nearsure.com/privacy-policy)

Seniority level
  • Seniority level
    Mid-Senior level
Employment type
  • Employment type
    Full-time
Job function
  • Job function
    Information Technology
  • Industries
    Software Development

Referrals increase your chances of interviewing at Nearsure by 2x

Sign in to set job alerts for “Site Reliability Engineer” roles.
Senior Site Reliability / Gitops Engineer
Software Engineer (Python/Linux/Packaging)
Python and Kubernetes Software Engineer - Data, AI/ML & Analytics
Software Engineer - Solutions Engineering
Python and Kubernetes Software Engineer - Data, Workflows, AI/ML & Analytics
Python Software Engineer - Ubuntu Hardware Certification Team
Graduate Software Engineer, Open Source and Linux, Canonical Ubuntu
Senior Software Engineer (long term contract)
Distributed Systems Software Engineer, Python / Go
Golang System Software Engineer - Containers / Virtualisation

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Obtém a tua avaliação gratuita e confidencial do currículo.
ou arrasta um ficheiro em formato PDF, DOC, DOCX, ODT ou PAGES até 5 MB.

Ofertas semelhantes

Staff Site Reliability Engineer - Work from home

Nearsure

Rio de Janeiro null

Teletrabalho

Teletrabalho

USD 70.000 - 100.000

Tempo integral

Há 14 dias

Staff Site Reliability Engineer - Work from home

Nearsure

Região Geográfica Intermediária de São Paulo null

Teletrabalho

Teletrabalho

USD 70.000 - 130.000

Tempo integral

Ontem
Torna-te num dos primeiros candidatos

Staff Site Reliability Engineer - Work from home

Nearsure

São Paulo null

Teletrabalho

Teletrabalho

BRL 80.000 - 120.000

Tempo integral

Há 23 dias

Site Reliability Engineer - Remote Work

BairesDev

Rio de Janeiro null

Teletrabalho

Teletrabalho

BRL 80.000 - 120.000

Tempo integral

Há 30+ dias

Site Reliability Engineer - Remote Work | REF#281640

BairesDev

Rio de Janeiro null

Teletrabalho

Teletrabalho

BRL 80.000 - 120.000

Tempo integral

Há 30+ dias

Staff Site Reliability Engineer - Work from home

Nearsure

São Paulo null

Teletrabalho

Teletrabalho

USD 80.000 - 120.000

Tempo integral

Há 29 dias

Site Reliability Engineer

Canonical

São Paulo null

Teletrabalho

Teletrabalho

BRL 80.000 - 120.000

Tempo integral

Há 30+ dias

Senior Site Reliability Engineer

Canonical

Belo Horizonte null

Teletrabalho

Teletrabalho

USD 60.000 - 90.000

Tempo integral

Há 29 dias

Senior Site Reliability Engineer

Canonical

Porto Alegre null

Teletrabalho

Teletrabalho

USD 70.000 - 100.000

Tempo integral

Há 30+ dias