Ativa os alertas de emprego por e-mail!

Senior Site Reliability Engineer

Latitude.sh, Inc.

Brasil

Teletrabalho

BRL 120.000 - 160.000

Tempo integral

Hoje

Torna-te num dos primeiros candidatos

Resumo da oferta

A global cloud infrastructure company is looking for a Senior Site Reliability Engineer to enhance platform reliability and automate operations. Responsibilities include designing tools for incident response and ensuring system resiliency. The ideal candidate has experience with Linux, Kubernetes, and observability stacks. This position offers competitive compensation and opportunities for professional growth.

Serviços

Contractor (PJ)

Paid Time Off

Competitive Compensation

Wellhub (former Gympass)

Annual Bonus based on performance

Opportunities for professional growth

Qualificações

Strong verbal and written English communication skills are essential.
Advanced knowledge of Linux/Unix systems in production environments.
Experience with Kubernetes and container orchestration is required.
Proficiency with infrastructure automation tools like Terraform or Ansible.
Experience with observability stacks like Prometheus or Grafana.
Familiarity with scripting languages such as Bash, Python, Go, or Ruby is useful.
Working knowledge of Git and CI/CD pipelines is necessary.
Solid understanding of incident management and root cause analysis.
Knowledge of cloud-native reliability and security best practices.

Responsabilidades

Continuously improve platform reliability and performance.
Design, build, and maintain tools to automate operational tasks.
Implement observability solutions, including monitoring and alerting.
Collaborate with teams to design scalable and resilient systems.
Participate in on-call rotations and lead post-incident reviews.
Develop and document processes ensuring operational excellence.
Contribute to SLOs/SLIs definition and reliability metrics.

Conhecimentos

Strong verbal and written English communication skills

Advanced knowledge of Linux/Unix systems

Experience with Kubernetes

Proficiency with infrastructure automation tools

Experience with observability stacks

Familiarity with scripting languages

Working knowledge of Git and CI/CD

Solid understanding of incident management

Knowledge of reliability and security best practices

Latitude.sh's global computing platform was launched in 2019, enabling businesses to programmatically deploy single-tenant Bare Metal instances in different parts of the world.

We are a team of passionate individuals about hardware, software, and network infrastructure looking to build the fastest, easiest-to-use, developer-centric single-tenant Cloud infrastructure. If you share this passion, join our growing team of talented people and help build the future of the Internet.

Summary

At Latitude.sh, the Reliability team is responsible for the health and resilience of the infrastructure that powers our global bare metal cloud. As a Senior Site Reliability Engineer (SRE), you’ll focus on building reliable, observable, and self-healing systems at scale.

SREs at Latitude.sh work at the intersection of software engineering and infrastructure. You’ll design and implement tools that automate operations, improve incident response, and enhance system observability—ensuring our platform is always ready for the workloads of our customers.

This might be a good opportunity if you’re passionate about reliability, automation, and creating cloud-like experiences for bare metal infrastructure.

Key Responsabilities

Continuously improve Latitude.sh’s platform reliability and performance
Design, build, and maintain tools to automate operational tasks and incident response
Implement and improve observability solutions, including monitoring, alerting, and tracing
Collaborate with engineering and platform teams to design scalable and resilient systems
Participate in on-call rotations and lead post-incident reviews with a focus on learning
Develop and document processes and runbooks that ensure operational excellence
Contribute to SLOs/SLIs definition and reliability metrics adoption across teams

Skills and Qualifications

Strong verbal and written English communication skills
Advanced knowledge of Linux/Unix systems in production environments
Experience with Kubernetes and container orchestration
Proficiency with infrastructure automation tools (e.g., Terraform, Ansible)
Experience with observability stacks (e.g., Prometheus, Grafana, Loki, ELK)
Familiarity with scripting and programming languages such as Bash, Python, Go, or Ruby
Working knowledge of Git and CI/CD pipelines
Solid understanding of incident management and root cause analysis processes
Knowledge of cloud-native reliability and security best practices

What do we offer?

Contractor (PJ)
Paid Time Off
Competitive Compensation
Wellhub (former Gympass)
Annual Bonus based on company and team performance
Opportunities for professional growth and development

Why Latitude.sh?

We're a lean, agile team of passionate professionals who believe in the power of innovation and creative problem-solving. As part of our team, you won't be lost in the crowd – you'll be an essential contributor, making a real impact from day one.

Our values at Latitude.sh guide us in all our work and partnerships. We're proud to be an inclusive company, and we welcome all applicants for our open positions, regardless of their background, religion, sexual orientation, gender identity, age, nationality, or disability. If these values speak to you, we'd love for you to become a part of our team.

Obtém a tua avaliação gratuita e confidencial do currículo.

ou arrasta um ficheiro em formato PDF, DOC, DOCX, ODT ou PAGES até 5 MB.