Job Search and Career Advice Platform

Enable job alerts via email!

Site Reliability Engineer, India

Jobgether

India

Remote

INR 9,00,000 - 12,00,000

Full time

Today
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A technology staffing company is seeking a Site Reliability Engineer to contribute to the stability of a cloud-native SaaS platform in India. This role involves enhancing system reliability and collaborating with engineering teams, utilizing AWS technologies. Candidates should have a minimum of 3 years of experience and proficiency in Python or Java. The position offers opportunities for innovation in a remote/hybrid environment with flexible hours.

Benefits

Flexible working hours
Professional development opportunities
Competitive compensation

Qualifications

  • Minimum 3 years of experience managing highly available production systems.
  • Proficient in at least one programming language such as Python, Java, or Rust.
  • Experience with observability tools like Datadog or CloudWatch.
  • Hands-on experience with AWS services including Lambda and EC2.

Responsibilities

  • Strengthen the reliability, performance, and scalability of a multi-tenant SaaS platform.
  • Collaborate with engineering teams to diagnose incidents and implement solutions.
  • Automate repetitive tasks and operational processes.
  • Support CI/CD practices for smooth releases.

Skills

System reliability
Automation tools
Observability tools
Debugging
Python
AWS services
Linux systems

Education

Degree in Computer Science or Information Technology

Tools

AWS
Jenkins
Terraform
Job description

This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Site Reliability Engineer, India.

In this role, you will contribute to the stability, scalability, and resilience of a large cloud-native SaaS platform used by major global players in the media and broadcast sector. You will collaborate with high-performing engineering teams to enhance system reliability, improve observability, and automate workflows across a modern serverless environment. Working with cutting-edge AWS technologies, you will troubleshoot complex issues, optimize performance, and proactively strengthen platform health. The position offers the opportunity to innovate, experiment with new tools, and influence best practices across a rapidly evolving technical ecosystem. You will thrive in an environment that values creativity, ownership, and continuous learning.

Accountabilities
  • Strengthen the reliability, performance, and scalability of a multi‑tenant SaaS platform hosted in AWS with a serverless‑first architecture.
  • Collaborate closely with engineering teams to diagnose incidents, conduct root‑cause analysis, and implement sustainable long‑term solutions.
  • Enhance observability by leveraging monitoring, logging, and tracing tools to identify performance bottlenecks and prevent failures.
  • Automate repetitive tasks and operational processes through tools, scripts, and well‑designed software components.
  • Contribute to defining, measuring, and improving SLOs and SLIs to drive operational excellence.
  • Support CI/CD practices to ensure smooth, high‑velocity releases in a distributed engineering environment.
  • Participate in system improvements, platform modernization initiatives, and ongoing reliability‑focused engineering efforts.
Requirements
  • Minimum 3 years of experience managing highly available, mission‑critical production systems with a strong track record in reliability and uptime.
  • Proficiency in at least one programming language such as Python, Java, or Rust, with experience building automation tools or software libraries.
  • At least 3 years working with observability tools such as Datadog, CloudWatch, Honeycomb, Splunk, or New Relic, using metrics and logs to drive decisions.
  • Strong analytical and debugging abilities, with a deep understanding of system flows, architecture, and potential failure modes.
  • Hands‑on experience translating SLOs and SLIs into platform improvements.
  • Minimum 3 years of practical experience with AWS services including CloudFormation, Lambda, DynamoDB, SQS, SNS, EC2, S3, AWS CLI, and Boto3.
  • Solid grounding in Linux systems, networking fundamentals, and security principles.
  • Familiarity with CI/CD systems such as Jenkins or AWS CodePipeline.
Nice‑to‑have skills
  • Experience architecting and deploying serverless cloud applications.
  • Knowledge of IaC tools such as Terraform or CloudFormation.
  • Previous participation in production on‑call rotations and incident management processes.
  • Expertise optimizing AWS services like Lambda, DynamoDB, API Gateway, SQS, EventBridge, and EC2.
  • Experience supporting systems with frequent deployment cycles in fast‑paced environments.
  • Familiarity with security compliance frameworks such as OWASP, ISO, CSA, or PCI.
  • Background in threat modeling, penetration testing, or security auditing.
  • Knowledge of advanced deployment patterns (canary, blue/green, A/B testing, red/line).
  • Hands‑on experience with chaos engineering practices.
  • Proven ability to champion reliability culture and operational excellence.
Experience

4 to 6+ years

Education

Degree in Computer Science or Information Technology

Work mode

Remote/Hybrid

Office hours

1 pm to 9 pm IST

Benefits
  • Flexible working hours supporting work–life balance.
  • Opportunity to innovate and experiment with new technologies and tools.
  • Collaborative, global, and low‑bureaucracy engineering environment.
  • International exposure working with modern cloud‑native media technologies.
  • Professional development opportunities including mentoring and educational support.
  • Competitive compensation and comprehensive benefits package.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.