Ativa os alertas de emprego por e-mail!

Site Reliability Engineer (Remote Brazil)

Luxoft

Brasil

Teletrabalho

BRL 80.000 - 120.000

Tempo integral

Há 30+ dias

Melhora as tuas possibilidades de ir a entrevistas

Cria um currículo adaptado à oferta de emprego para teres uma taxa de sucesso superior.

Resumo da oferta

An innovative IT services firm is seeking a Site Reliability Engineer to enhance observability and reliability for critical systems. In this exciting role, you will collaborate with product teams and DevOps engineers to implement monitoring tools and drive best practices in automation and incident management. Your expertise in Java, Linux, and SRE will be crucial in ensuring the performance and reliability of software products. Join a dynamic team where your contributions will make a significant impact on service delivery and operational excellence. If you are a critical thinker with a passion for solving complex problems, this opportunity is perfect for you.

Qualificações

  • 5+ years in production support for e-commerce platforms.
  • Strong knowledge of Java and SRE practices.
  • Experience with monitoring tools and automation.

Responsabilidades

  • Implement monitoring and observability instrumentation for platforms.
  • Drive automation and best practices in site reliability.
  • Lead engineering efforts focusing on logs, metrics, and traces.

Conhecimentos

Java
Site Reliability Engineering (SRE)
Scripting
Linux Administration
TCP/IP Networking
Monitoring Tools
Virtualization and Containerization
Agile and DevOps Culture
Problem Solving
Communication Skills

Ferramentas

Grafana
Datadog
Prometheus
Ansible
Terraform
Jenkins
Azure DevOps

Descrição da oferta de emprego

Site Reliability Engineer (Remote Brazil)

Project Description:

Do you like to work with existing and new software product development teams? This position is to instrument end-to-end observability and visibility for business-critical systems with log ingestion, metrics, and traces. You will function as a site reliability engineer (SRE) that will collaborate with product teams, infrastructure SMEs, DevOps engineers, and the proactive monitoring team to provide unique dashboards of germane service level analytics for various product stakeholders.

Responsibilities:

  1. Work closely with software product development teams (ITSO, Product Owner, SME) to implement monitoring & observability instrumentation within their platforms.
  2. Drive adoption of best practices in monitoring, alerting, automation, and site reliability.
  3. Lead/contribute to engineering efforts from design to implementation focusing on instrumentation of logs, metrics, and traces.
  4. Drive use of automation in software instrumentation as well as in response to service degradation events.
  5. Identify and execute on opportunities to implement instrumentation in pre-production environments.
  6. Proactively pursue continuous improvement and expansion in observability coverage, service reliability best practices, incident management, and problem management.

Mandatory Skills Description:

  • Production support experience as developer for e-commerce platform
  • Strong knowledge and experience in Java
  • SRE experience
  • Scripting experience
  • 5+ years of experience with administrating Linux and at least 2 years in supporting production environments;
  • Experience with designing large-scale distributed solutions accompanied with its capacity planning;
  • Deep understanding of TCP/IP networking;
  • Familiar with SLA, SLO, and SLI terms;
  • Experience with monitoring and alerting tools like Grafana, Datadog, Prometheus etc;
  • Strong knowledge of virtualization and containerization principles including orchestration tools;
  • Familiar with CaC and IaC tools (Ansible, Salt, Terraform, Packer);
  • Familiar with CI/CD tools (Jenkins, Azure DevOps);
  • Experience with relational and NoSQL DBMS
  • A clear understanding of Agile and DevOps culture and what kind of problem they intended to solve;
  • Strong written and verbal communication skills;
  • Understanding of information security principles;
  • Understanding of popular deployment strategies (Feature flags, Blue/Green, Canary, Dark launch, etc);
  • Critical thinker and problem solver

Nice-to-Have Skills Description:

  • Experience working with Azure
  • Previous experience of working in SRE teams;

Seniority level: Mid-Senior level

Employment type: Full-time

Job function: Information Technology

Industries: IT Services and IT Consulting

Obtém a tua avaliação gratuita e confidencial do currículo.
ou arrasta um ficheiro em formato PDF, DOC, DOCX, ODT ou PAGES até 5 MB.

Ofertas semelhantes

Senior Site Reliability Engineer (Remote-Brazil)

Loadsmart

São Paulo

Teletrabalho

USD 80,000 - 120,000

Há 11 dias

Senior Site Reliability Engineer (Remote-Brazil)

Loadsmart, Inc.

São Paulo

Teletrabalho

USD 80,000 - 120,000

Hoje
Torna-te num dos primeiros candidatos