Enable job alerts via email!

Site Reliability Engineer

Dealls – Jobs, CV & Mentoring

Surabaya ꦱꦸꦫꦧꦪ

On-site

IDR 300.000.000 - 400.000.000

Full time

13 days ago

Job summary

A technology-focused firm in Surabaya is seeking a skilled Site Reliability Engineer to enhance system reliability and performance. This role involves designing and implementing robust infrastructure, automating operations, and collaborating closely with development and security teams. Ideal candidates will have 3+ years of experience in Site Reliability Engineering with strong skills in cloud platforms and automation tools. Offers a dynamic and collaborative work environment.

Qualifications

  • 3+ years of experience in Site Reliability Engineering, DevOps, or related fields.
  • Strong expertise in cloud platforms (GCP).
  • Proficiency in Kubernetes, Docker, Terraform, and IaC tools.
  • Experience with CI/CD pipelines using Jenkins, GitHub Actions, or similar tools.
  • Strong monitoring & logging experience (Prometheus, Grafana, ELK, Datadog).
  • Proficiency in scripting and automation (Python, Go, Bash, or similar).

Responsibilities

  • Monitor, troubleshoot, and optimize system performance.
  • Develop CI/CD pipelines and automated monitoring solutions.
  • Respond to system incidents and conduct root cause analysis.
  • Design solutions that scale with business growth.
  • Ensure systems comply with security practices and regulations.
  • Implement logging, metrics, and alerting tools.

Skills

Site Reliability Engineering
DevOps
Cloud platforms (GCP)
Kubernetes
Docker
Terraform
CI/CD pipelines
Python
Monitoring & logging
Networking

Tools

Prometheus
Grafana
Datadog
Jenkins
GitHub Actions
Job description

We are seeking a highly skilled Site Reliability Engineer (SRE) to ensure the reliability, scalability, and performance of our systems. As a key member of the engineering team, you will be responsible for designing and implementing robust infrastructure, automating operations, and improving system resilience. You will collaborate closely with developers, DevOps, and security teams to create a highly available and fault-tolerant platform.

Responsibilities
  • Ensure System Reliability & Performance: Monitor, troubleshoot, and optimize system performance across cloud and on-prem infrastructure.
  • Automate Operations & Deployment: Develop CI/CD pipelines, infrastructure-as-code (IaC), and automated monitoring solutions.
  • Incident Management & Troubleshooting: Respond to system incidents, conduct root cause analysis, and implement long-term fixes.
  • Scalability & Capacity Planning: Design and implement solutions that scale with business growth and handle high traffic loads.
  • Security & Compliance: Ensure systems follow best security practices, comply with regulatory requirements, and protect against vulnerabilities.
  • Observability & Monitoring: Implement logging, metrics, and alerting tools (e.g., Prometheus, Grafana, Datadog) to improve system visibility.
  • Collaboration & Best Practices: Work with development teams to improve software reliability and establish best practices for high-availability systems.
Requirements
  • Technical Skills
  • 3+ years of experience in Site Reliability Engineering, DevOps, or related fields.
  • Strong expertise in cloud platforms (we are using GCP).
  • Proficiency in Kubernetes, Docker, Terraform, and infrastructure-as-code (IaC) tools.
  • Experience with CI/CD pipelines using Jenkins, GitHub Actions, ArgoCD, or similar tools.
  • Strong monitoring & logging experience (Prometheus, Grafana, ELK, Datadog).
  • Proficiency in scripting and automation (Python, Go, Bash, or similar).
  • Experience with networking, load balancers, and security best practices.
  • Soft Skills
  • Strong problem-solving and troubleshooting abilities.
  • Excellent communication and collaboration skills.
  • Ability to work in a fast-paced, high-availability environment.
  • Experience leading reliability initiatives and mentoring junior engineers.
  • Preferred Qualifications
  • Experience with service mesh (Istio, Linkerd).
  • Familiarity with database reliability (PostgreSQL, MySQL, Redis, etc.).
  • Previous experience in high-scale production environments (e.g., SaaS, fintech, e-commerce).
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.