Ativa os alertas de emprego por e-mail!

Senior Site Reliability Engineer (SRE)

Avra

São Paulo

Teletrabalho

BRL 80.000 - 130.000

Tempo integral

Há 30+ dias

Melhora as tuas possibilidades de ir a entrevistas

Cria um currículo adaptado à oferta de emprego para teres uma taxa de sucesso superior.

Resumo da oferta

Join a forward-thinking company as a Senior Site Reliability Engineer, where you'll design and maintain cutting-edge infrastructure for an AI platform. This role is pivotal in ensuring the reliability and scalability of systems that process vast data and deliver real-time insights. Collaborate with talented teams to tackle complex challenges in a flexible remote work environment, while enjoying competitive compensation and a comprehensive benefits package. If you're passionate about building resilient infrastructure and making a significant impact in the tech landscape, this opportunity is for you.

Serviços

Unlimited vacation

National health plan

Generous parental leave

Equity participation

Flexible work culture

Qualificações

5+ years in Site Reliability Engineering or DevOps roles.
Hands-on Kubernetes experience in production environments.

Responsabilidades

Design and implement fault-tolerant systems across multi-cloud environments.
Develop comprehensive monitoring and logging systems.

Conhecimentos

Kubernetes

Python

AWS

GCP

Docker

OpenTelemetry

Prometheus

Grafana

ELK stack

Formação académica

Bachelor's Degree in Computer Science or related field

Ferramentas

Terraform

GKE

EKS

Join to apply for the Senior Site Reliability Engineer (SRE) role at Avra

6 days ago Be among the first 25 applicants

Join to apply for the Senior Site Reliability Engineer (SRE) role at Avra

Get AI-powered advice on this job and more exclusive features.

About Avra

Avra is a deep tech data intelligence platform powered by foundational AI that translates the complexity of SMEs into strategic decisions for large enterprises. We develop our own foundational models from the ground up—without relying on third-party solutions—to deliver innovative insights that empower some of the leading banks and fintechs across Latin America. Founded in 2024 by Bruno Alano (ex-OpenAI) and Viviane Meister, our team brings together expertise from NVIDIA, Palantir, Google, and more to drive real impact.

About The Role

As a Senior Site Reliability Engineer at Avra, you will be responsible for designing, building, and maintaining the infrastructure that powers our AI platform. You will play a crucial role in ensuring the reliability, scalability, and security of our systems as we process vast amounts of data and deliver real-time insights. Working closely with our engineering and data science teams, you will create resilient infrastructure that supports our heterogeneous graph neural networks and knowledge graph processing capabilities.

Responsibilities

Platform Reliability: Design and implement highly available, fault-tolerant systems across our multi-cloud environment (AWS and GCP) that support our graph processing and AI inference workloads.
Kubernetes Platform Engineering: Design, implement, and maintain our production Kubernetes environments on GKE and AWS, ensuring high availability, scalability, and security for our graph processing and AI inference workloads.
Observability & Monitoring: Develop comprehensive monitoring, alerting, and logging systems to ensure 99.9%+ uptime for critical services and provide visibility into system performance.
Infrastructure as Code: Create and maintain infrastructure as code using Terraform to automate provisioning and configuration management.
Performance Optimization: Identify and resolve performance bottlenecks in our distributed systems, particularly around graph processing and real-time inference workflows.
Security Engineering: Collaborate with security teams to implement robust security practices, supporting our ISO 27001 and NIST CSF 2.0 certification efforts.
CI/CD Pipeline Enhancement: Improve and maintain our continuous integration and deployment pipelines to support rapid, reliable software delivery.
Incident Response: Lead incident response efforts, conduct post-mortems, and implement systems to prevent recurrence of issues.

You Stand Out If

You have experience building and maintaining infrastructure for data-intensive or AI applications, particularly those involving graph processing or machine learning.
You have DEEP expertise with Kubernetes, including advanced concepts such as custom controllers, operators, networking policies, and multi-cluster management.
You excel at designing scalable, distributed systems that can handle terabytes of data and millions of requests.
You are proficient with cloud orchestration tools like Kubernetes and have experience managing deployments across AWS and GCP environments.
You have significant experience with GKE (Google Kubernetes Engine) and EKS (Amazon Elastic Kubernetes Service) in production environments.
You have implemented robust observability solutions and can effectively troubleshoot complex system failures.
You practice a security-first mindset and have experience implementing infrastructure security controls.
You are passionate about automation and eliminating toil through effective tooling.

Qualifications

Experience: 5+ years in Site Reliability Engineering, DevOps, or Infrastructure Engineering roles, with at least 3 years of hands-on Kubernetes experience in production environments.
Kubernetes Expertise: Proven experience managing Kubernetes at scale, including cluster architecture, security hardening, resource optimization, and upgrade management.
Technical Skills: Proficiency in programming (Go, Python, or similar), cloud platforms (AWS, GCP), containerization (Docker, Kubernetes), and monitoring technologies (OpenTelemetry, Prometheus, Grafana, ELK stack, etc.).
System Design: Strong understanding of distributed systems design, failure modes, and mitigation strategies.
Problem-Solving: Exceptional debugging skills and the ability to troubleshoot complex issues across the entire technology stack.
Collaboration: Excellent communication skills and ability to work effectively with cross-functional teams in a remote environment.

Why Join Avra?

Cutting-Edge Technology: Build infrastructure for a deep tech AI platform that processes data from millions of Brazilian companies to enable better business decisions.
Competitive Compensation: Attractive salary, equity participation, and full transparency in our compensation structure.
Direct Impact: Work closely with the founders to shape the infrastructure vision of a fast-growing startup.
Technical Challenges: Solve complex problems around graph processing, real-time inference, and large-scale data systems.
Flexible Work Culture: Enjoy the benefits of 100% remote work with access to an office in São Paulo, unlimited vacation, and a comprehensive benefits package including a national health plan and generous parental leave.

If you are passionate about building reliable, scalable infrastructure for AI systems and want to help us revolutionize how businesses make decisions about SMEs in Brazil, we'd love to hear from you. Apply now to join Avra and help us build the future of AI-powered business intelligence in Latin America.

Seniority level

Seniority level
Mid-Senior level

Employment type

Employment type
Full-time

Job function

Job function
Engineering and Information Technology

Referrals increase your chances of interviewing at Avra by 2x

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Obtém a tua avaliação gratuita e confidencial do currículo.

ou arrasta um ficheiro em formato PDF, DOC, DOCX, ODT ou PAGES até 5 MB.

Ofertas semelhantes

Torna-te num dos primeiros candidatos

Site Reliability Engineer

NinjaOne

São Paulo null

Teletrabalho

USD 60,000 - 100,000

Tempo integral

Há 30+ dias

BCG X Senior Platform Engineer

The Boston Consulting Group GmbH

São Paulo null

Presencial

BRL 120,000 - 160,000

Tempo integral

Há 5 dias

Torna-te num dos primeiros candidatos

Site Reliability Engineer (SRE) AWS

Pragmatike

Buenos Aires null

Teletrabalho

USD 60,000 - 100,000

Tempo integral

Há 27 dias

Staff Site Reliability Engineer - Work from home

Nearsure

Rio de Janeiro null

Teletrabalho

USD 70,000 - 100,000

Tempo integral

Há 17 dias

Senior Site Reliability Engineer (SRE)

Avra

São Paulo

Teletrabalho

BRL 80.000 - 130.000

Tempo integral

Resumo da oferta

Serviços

Qualificações

Responsabilidades

Conhecimentos

Formação académica

Ferramentas

Descrição da oferta de emprego

Ofertas semelhantes

Site Reliability Engineer - LATAM

São Paulo null

Teletrabalho

Teletrabalho

USD 70,000 - 90,000

Tempo integral

Staff Site Reliability Engineer - Work from home

São Paulo null

Teletrabalho

Teletrabalho

BRL 80,000 - 120,000

Tempo integral

Senior Site Reliability Engineer (Remote-Brazil)

São Paulo null

Teletrabalho

Teletrabalho

USD 80,000 - 120,000

Tempo integral

Staff Site Reliability Engineer - Work from home

São Paulo null

Teletrabalho

Teletrabalho

USD 80,000 - 120,000

Tempo integral

Database Reliability Engineer Pleno – DBRE

null null

Teletrabalho

Teletrabalho

BRL 120,000 - 160,000

Tempo integral

Site Reliability Engineer

São Paulo null

Teletrabalho

Teletrabalho

USD 60,000 - 100,000

Tempo integral

BCG X Senior Platform Engineer

São Paulo null

Presencial

Presencial

BRL 120,000 - 160,000

Tempo integral

Site Reliability Engineer (SRE) AWS

Buenos Aires null

Teletrabalho

Teletrabalho

USD 60,000 - 100,000

Tempo integral

Staff Site Reliability Engineer - Work from home

Rio de Janeiro null

Teletrabalho

Teletrabalho

USD 70,000 - 100,000

Tempo integral