Ativa os alertas de emprego por e-mail!

Senior Site Reliability Engineer

Latitude.sh, Inc.

Brasil

Teletrabalho

BRL 120.000 - 160.000

Tempo integral

Hoje
Torna-te num dos primeiros candidatos

Resumo da oferta

A global cloud infrastructure company is looking for a Senior Site Reliability Engineer to enhance platform reliability and automate operations. Responsibilities include designing tools for incident response and ensuring system resiliency. The ideal candidate has experience with Linux, Kubernetes, and observability stacks. This position offers competitive compensation and opportunities for professional growth.

Serviços

Contractor (PJ)
Paid Time Off
Competitive Compensation
Wellhub (former Gympass)
Annual Bonus based on performance
Opportunities for professional growth

Qualificações

  • Strong verbal and written English communication skills are essential.
  • Advanced knowledge of Linux/Unix systems in production environments.
  • Experience with Kubernetes and container orchestration is required.
  • Proficiency with infrastructure automation tools like Terraform or Ansible.
  • Experience with observability stacks like Prometheus or Grafana.
  • Familiarity with scripting languages such as Bash, Python, Go, or Ruby is useful.
  • Working knowledge of Git and CI/CD pipelines is necessary.
  • Solid understanding of incident management and root cause analysis.
  • Knowledge of cloud-native reliability and security best practices.

Responsabilidades

  • Continuously improve platform reliability and performance.
  • Design, build, and maintain tools to automate operational tasks.
  • Implement observability solutions, including monitoring and alerting.
  • Collaborate with teams to design scalable and resilient systems.
  • Participate in on-call rotations and lead post-incident reviews.
  • Develop and document processes ensuring operational excellence.
  • Contribute to SLOs/SLIs definition and reliability metrics.

Conhecimentos

Strong verbal and written English communication skills
Advanced knowledge of Linux/Unix systems
Experience with Kubernetes
Proficiency with infrastructure automation tools
Experience with observability stacks
Familiarity with scripting languages
Working knowledge of Git and CI/CD
Solid understanding of incident management
Knowledge of reliability and security best practices
Descrição da oferta de emprego

Latitude.sh's global computing platform was launched in 2019, enabling businesses to programmatically deploy single-tenant Bare Metal instances in different parts of the world.

We are a team of passionate individuals about hardware, software, and network infrastructure looking to build the fastest, easiest-to-use, developer-centric single-tenant Cloud infrastructure. If you share this passion, join our growing team of talented people and help build the future of the Internet.

Summary

At Latitude.sh, the Reliability team is responsible for the health and resilience of the infrastructure that powers our global bare metal cloud. As a Senior Site Reliability Engineer (SRE), you’ll focus on building reliable, observable, and self-healing systems at scale.

SREs at Latitude.sh work at the intersection of software engineering and infrastructure. You’ll design and implement tools that automate operations, improve incident response, and enhance system observability—ensuring our platform is always ready for the workloads of our customers.

This might be a good opportunity if you’re passionate about reliability, automation, and creating cloud-like experiences for bare metal infrastructure.

Key Responsabilities
  • Continuously improve Latitude.sh’s platform reliability and performance
  • Design, build, and maintain tools to automate operational tasks and incident response
  • Implement and improve observability solutions, including monitoring, alerting, and tracing
  • Collaborate with engineering and platform teams to design scalable and resilient systems
  • Participate in on-call rotations and lead post-incident reviews with a focus on learning
  • Develop and document processes and runbooks that ensure operational excellence
  • Contribute to SLOs/SLIs definition and reliability metrics adoption across teams
Skills and Qualifications
  • Strong verbal and written English communication skills
  • Advanced knowledge of Linux/Unix systems in production environments
  • Experience with Kubernetes and container orchestration
  • Proficiency with infrastructure automation tools (e.g., Terraform, Ansible)
  • Experience with observability stacks (e.g., Prometheus, Grafana, Loki, ELK)
  • Familiarity with scripting and programming languages such as Bash, Python, Go, or Ruby
  • Working knowledge of Git and CI/CD pipelines
  • Solid understanding of incident management and root cause analysis processes
  • Knowledge of cloud-native reliability and security best practices
What do we offer?
  • Contractor (PJ)
  • Paid Time Off
  • Competitive Compensation
  • Wellhub (former Gympass)
  • Annual Bonus based on company and team performance
  • Opportunities for professional growth and development
Why Latitude.sh?

We're a lean, agile team of passionate professionals who believe in the power of innovation and creative problem-solving. As part of our team, you won't be lost in the crowd – you'll be an essential contributor, making a real impact from day one.

Our values at Latitude.sh guide us in all our work and partnerships. We're proud to be an inclusive company, and we welcome all applicants for our open positions, regardless of their background, religion, sexual orientation, gender identity, age, nationality, or disability. If these values speak to you, we'd love for you to become a part of our team.

Obtém a tua avaliação gratuita e confidencial do currículo.
ou arrasta um ficheiro em formato PDF, DOC, DOCX, ODT ou PAGES até 5 MB.