Ativa os alertas de emprego por e-mail!

Senior Site Reliability Engineer (SRE)

Avra

São Paulo

Teletrabalho

BRL 80.000 - 130.000

Tempo integral

Há 10 dias

Melhora as tuas possibilidades de ir a entrevistas

Cria um currículo adaptado à oferta de emprego para teres uma taxa de sucesso superior.

Resumo da oferta

Join a forward-thinking company as a Senior Site Reliability Engineer, where you'll design and maintain cutting-edge infrastructure for an AI platform. This role is pivotal in ensuring the reliability and scalability of systems that process vast data and deliver real-time insights. Collaborate with talented teams to tackle complex challenges in a flexible remote work environment, while enjoying competitive compensation and a comprehensive benefits package. If you're passionate about building resilient infrastructure and making a significant impact in the tech landscape, this opportunity is for you.

Serviços

Unlimited vacation
National health plan
Generous parental leave
Equity participation
Flexible work culture

Qualificações

  • 5+ years in Site Reliability Engineering or DevOps roles.
  • Hands-on Kubernetes experience in production environments.

Responsabilidades

  • Design and implement fault-tolerant systems across multi-cloud environments.
  • Develop comprehensive monitoring and logging systems.

Conhecimentos

Kubernetes
Go
Python
AWS
GCP
Docker
OpenTelemetry
Prometheus
Grafana
ELK stack

Formação académica

Bachelor's Degree in Computer Science or related field

Ferramentas

Terraform
GKE
EKS

Descrição da oferta de emprego

Join to apply for the Senior Site Reliability Engineer (SRE) role at Avra

6 days ago Be among the first 25 applicants

Join to apply for the Senior Site Reliability Engineer (SRE) role at Avra

Get AI-powered advice on this job and more exclusive features.

About Avra

Avra is a deep tech data intelligence platform powered by foundational AI that translates the complexity of SMEs into strategic decisions for large enterprises. We develop our own foundational models from the ground up—without relying on third-party solutions—to deliver innovative insights that empower some of the leading banks and fintechs across Latin America. Founded in 2024 by Bruno Alano (ex-OpenAI) and Viviane Meister, our team brings together expertise from NVIDIA, Palantir, Google, and more to drive real impact.

About Avra

Avra is a deep tech data intelligence platform powered by foundational AI that translates the complexity of SMEs into strategic decisions for large enterprises. We develop our own foundational models from the ground up—without relying on third-party solutions—to deliver innovative insights that empower some of the leading banks and fintechs across Latin America. Founded in 2024 by Bruno Alano (ex-OpenAI) and Viviane Meister, our team brings together expertise from NVIDIA, Palantir, Google, and more to drive real impact.

About The Role

As a Senior Site Reliability Engineer at Avra, you will be responsible for designing, building, and maintaining the infrastructure that powers our AI platform. You will play a crucial role in ensuring the reliability, scalability, and security of our systems as we process vast amounts of data and deliver real-time insights. Working closely with our engineering and data science teams, you will create resilient infrastructure that supports our heterogeneous graph neural networks and knowledge graph processing capabilities.

Responsibilities

  • Platform Reliability: Design and implement highly available, fault-tolerant systems across our multi-cloud environment (AWS and GCP) that support our graph processing and AI inference workloads.
  • Kubernetes Platform Engineering: Design, implement, and maintain our production Kubernetes environments on GKE and AWS, ensuring high availability, scalability, and security for our graph processing and AI inference workloads.
  • Observability & Monitoring: Develop comprehensive monitoring, alerting, and logging systems to ensure 99.9%+ uptime for critical services and provide visibility into system performance.
  • Infrastructure as Code: Create and maintain infrastructure as code using Terraform to automate provisioning and configuration management.
  • Performance Optimization: Identify and resolve performance bottlenecks in our distributed systems, particularly around graph processing and real-time inference workflows.
  • Security Engineering: Collaborate with security teams to implement robust security practices, supporting our ISO 27001 and NIST CSF 2.0 certification efforts.
  • CI/CD Pipeline Enhancement: Improve and maintain our continuous integration and deployment pipelines to support rapid, reliable software delivery.
  • Incident Response: Lead incident response efforts, conduct post-mortems, and implement systems to prevent recurrence of issues.

You Stand Out If

  • You have experience building and maintaining infrastructure for data-intensive or AI applications, particularly those involving graph processing or machine learning.
  • You have DEEP expertise with Kubernetes, including advanced concepts such as custom controllers, operators, networking policies, and multi-cluster management.
  • You excel at designing scalable, distributed systems that can handle terabytes of data and millions of requests.
  • You are proficient with cloud orchestration tools like Kubernetes and have experience managing deployments across AWS and GCP environments.
  • You have significant experience with GKE (Google Kubernetes Engine) and EKS (Amazon Elastic Kubernetes Service) in production environments.
  • You have implemented robust observability solutions and can effectively troubleshoot complex system failures.
  • You practice a security-first mindset and have experience implementing infrastructure security controls.
  • You are passionate about automation and eliminating toil through effective tooling.

Qualifications

  • Experience: 5+ years in Site Reliability Engineering, DevOps, or Infrastructure Engineering roles, with at least 3 years of hands-on Kubernetes experience in production environments.
  • Kubernetes Expertise: Proven experience managing Kubernetes at scale, including cluster architecture, security hardening, resource optimization, and upgrade management.
  • Technical Skills: Proficiency in programming (Go, Python, or similar), cloud platforms (AWS, GCP), containerization (Docker, Kubernetes), and monitoring technologies (OpenTelemetry, Prometheus, Grafana, ELK stack, etc.).
  • System Design: Strong understanding of distributed systems design, failure modes, and mitigation strategies.
  • Problem-Solving: Exceptional debugging skills and the ability to troubleshoot complex issues across the entire technology stack.
  • Collaboration: Excellent communication skills and ability to work effectively with cross-functional teams in a remote environment.

Why Join Avra?

  • Cutting-Edge Technology: Build infrastructure for a deep tech AI platform that processes data from millions of Brazilian companies to enable better business decisions.
  • Competitive Compensation: Attractive salary, equity participation, and full transparency in our compensation structure.
  • Direct Impact: Work closely with the founders to shape the infrastructure vision of a fast-growing startup.
  • Technical Challenges: Solve complex problems around graph processing, real-time inference, and large-scale data systems.
  • Flexible Work Culture: Enjoy the benefits of 100% remote work with access to an office in São Paulo, unlimited vacation, and a comprehensive benefits package including a national health plan and generous parental leave.

If you are passionate about building reliable, scalable infrastructure for AI systems and want to help us revolutionize how businesses make decisions about SMEs in Brazil, we'd love to hear from you. Apply now to join Avra and help us build the future of AI-powered business intelligence in Latin America.

Seniority level
  • Seniority level
    Mid-Senior level
Employment type
  • Employment type
    Full-time
Job function
  • Job function
    Engineering and Information Technology

Referrals increase your chances of interviewing at Avra by 2x

Sign in to set job alerts for “Senior Site Reliability Engineer” roles.

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Obtém a tua avaliação gratuita e confidencial do currículo.
ou arrasta um ficheiro em formato PDF, DOC, DOCX, ODT ou PAGES até 5 MB.

Ofertas semelhantes

Staff Site Reliability Engineer - Work from home

Nearsure

São Paulo

Teletrabalho

USD 80,000 - 120,000

Ontem
Torna-te num dos primeiros candidatos

Senior Site Reliability Engineer (Remote-Brazil)

Loadsmart

São Paulo

Teletrabalho

USD 80,000 - 120,000

Há 11 dias

Senior Site Reliability Engineer (SRE)

Avra

São Paulo

Teletrabalho

BRL 80,000 - 120,000

Há 11 dias

Site Reliability Engineer

NinjaOne

São Paulo

Teletrabalho

USD 60,000 - 100,000

Há 10 dias

Site Reliability Engineer

Elumini Outdoing IT

São Paulo

Híbrido

BRL 60,000 - 100,000

Ontem
Torna-te num dos primeiros candidatos

Site Reliability Engineer (SRE) Sênior

ASAAS

Teletrabalho

BRL 60,000 - 100,000

Há 10 dias

Sr. Site Reliability Engineer

Félix

Teletrabalho

BRL 60,000 - 100,000

Há 9 dias

Site reliability engineer sr vaga afirmativa para mulheres

NetVagas

Teletrabalho

BRL 120,000 - 180,000

Há 4 dias
Torna-te num dos primeiros candidatos

Site Reliability Engineer - Healthcare Industry

Devsu

Teletrabalho

BRL 80,000 - 120,000

Há 9 dias