Enable job alerts via email!

Site Reliability Engineer

Logile, Inc.

Khordha

On-site

INR 15,00,000 - 25,00,000

Full time

Today

Be an early applicant

Job summary

A technology solutions provider is seeking a Site Reliability Engineer to ensure the reliability and performance of infrastructure. The ideal candidate will have experience with monitoring tools like Prometheus and Grafana, cloud expertise, and strong automation skills. This onsite role requires collaboration with teams and availability for overlap with US working hours. Compensation is competitive with industry standards.

Benefits

Shift allowances

Home pickup and drop off

Qualifications

2 - 5 years of experience with monitoring, logging, and tracing tools.
Proficient in Linux system administration and networking fundamentals.
Solid skills in infrastructure automation.

Responsibilities

Design and manage observability systems.
Define and maintain SLAs, SLOs, and SLIs.
Build automation for infrastructure and incident response.

Skills

Monitoring tools (Prometheus, Grafana)

Cloud expertise (AWS, Azure, GCP)

Linux system administration

Infrastructure automation (Terraform, Ansible)

Programming (Python, Go, Bash)

Kubernetes

CI/CD practices

Tools

Terraform

Ansible

ELK/EFK

Jaeger

Company Overview

Logile is the leading retail labor planning, workforce management, inventory management and store execution provider deployed in thousands of retail locations across North America, Europe, Australia, and Oceania.

Our proven AI, machine-learning technology and industrial engineering accelerate ROI and enable operational excellence with improved performance and empowered employees. Retailers worldwide rely on Logile solutions to boost profitability and competitive advantage by delivering the best service and products at optimal cost.

From labor standards development and modeling to unified forecasting, storewide scheduling, and time and attendance, to inventory management, task management, food safety, and employee self-service — we transform retail operations with a unified store-level solution. Gain the Advantage with The Logic of Retail. One Platform for store planning, scheduling and execution.

For more information, visit www.logile.com.

Job Summary

We are seeking a motivated and experienced Site Reliability Engineer (SRE) to join our dynamic engineering team. The ideal candidate will have a strong background to ensure the reliability, scalability, and performance of our infrastructure and applications. The SRE will focus on building robust monitoring systems, automating operations, and bridging the gap between development and operations to achieve high service availability.

Key Responsibilities

Design, implement, and manage observability systems (Prometheus, Grafana, ELK/EFK, Jaeger, Open Telemetry).
Define and maintain SLAs, SLOs, and SLIs for services, ensuring reliability goals are met.
Build automation for infrastructure, monitoring, scaling, and incident response using Terraform, Ansible, and scripting (Python/Bash).
Collaborate with developers to design resilient and scalable systems following SRE best practices.
Lead incident management: monitoring alerts, root cause analysis, postmortems, and continuous improvement.
Implement chaos engineering and fault-tolerance testing to validate system resilience.
Drive capacity planning, performance tuning, and cost optimization across environments.
Ensure security, compliance, and governance in infrastructure monitoring.

Job Location & Schedule

This job is an onsite job at Logile Bhubaneswar Office.
It is expected that the selected candidate will be available to work with some hours of overlap with US working times.

Required Skills & Experience

2 - 5 years, strong experience with monitoring, logging, and tracing tools (Prometheus, Grafana, ELK, EFK, Jaeger, Open Telemetry, Loki).
Cloud expertise: AWS, Azure, or GCP monitoring and reliability practices (CloudWatch, Azure Monitor).
Proficiency in Linux system administration and networking fundamentals.
Solid skills in infrastructure automation (Terraform, Ansible, Helm).
Programming/scripting skills: Python, Go, Bash.
Experience with Kubernetes and containerized workloads.
Proven track record in CI/CD and DevOps practices.

Preferred Skills

Experience with chaos engineering tools (Gremlin, Litmus).
Strong collaboration skills to drive SRE culture across Dev & Ops teams.
Experience with Agile/Scrum environments.
Knowledge of security best practices (DevSecOps).

Compensation And Benefits

The compensation and benefits associated for this role is benchmarked against the best in industry and job location
Applicable shift allowances and home pick up and drops will be provided by Logile

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Site Reliability Engineer

Logile, Inc.

Khordha

On-site

INR 15,00,000 - 25,00,000

Full time

Job summary

Benefits

Qualifications

Responsibilities

Skills

Tools

Company

Services

Free resources

Support

Site Reliability Engineer

Logile, Inc.

Khordha

On-site

INR 15,00,000 - 25,00,000

Full time

Job summary

Benefits

Qualifications

Responsibilities

Skills

Tools

Follow us

Company

Services

Free resources

Support