This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Site Reliability Engineer, India.
In this role, you will contribute to the stability, scalability, and resilience of a large cloud-native SaaS platform used by major global players in the media and broadcast sector. You will collaborate with high-performing engineering teams to enhance system reliability, improve observability, and automate workflows across a modern serverless environment. Working with cutting-edge AWS technologies, you will troubleshoot complex issues, optimize performance, and proactively strengthen platform health. The position offers the opportunity to innovate, experiment with new tools, and influence best practices across a rapidly evolving technical ecosystem. You will thrive in an environment that values creativity, ownership, and continuous learning.
Accountabilities
- Strengthen the reliability, performance, and scalability of a multi‑tenant SaaS platform hosted in AWS with a serverless‑first architecture.
- Collaborate closely with engineering teams to diagnose incidents, conduct root‑cause analysis, and implement sustainable long‑term solutions.
- Enhance observability by leveraging monitoring, logging, and tracing tools to identify performance bottlenecks and prevent failures.
- Automate repetitive tasks and operational processes through tools, scripts, and well‑designed software components.
- Contribute to defining, measuring, and improving SLOs and SLIs to drive operational excellence.
- Support CI/CD practices to ensure smooth, high‑velocity releases in a distributed engineering environment.
- Participate in system improvements, platform modernization initiatives, and ongoing reliability‑focused engineering efforts.
Requirements
- Minimum 3 years of experience managing highly available, mission‑critical production systems with a strong track record in reliability and uptime.
- Proficiency in at least one programming language such as Python, Java, or Rust, with experience building automation tools or software libraries.
- At least 3 years working with observability tools such as Datadog, CloudWatch, Honeycomb, Splunk, or New Relic, using metrics and logs to drive decisions.
- Strong analytical and debugging abilities, with a deep understanding of system flows, architecture, and potential failure modes.
- Hands‑on experience translating SLOs and SLIs into platform improvements.
- Minimum 3 years of practical experience with AWS services including CloudFormation, Lambda, DynamoDB, SQS, SNS, EC2, S3, AWS CLI, and Boto3.
- Solid grounding in Linux systems, networking fundamentals, and security principles.
- Familiarity with CI/CD systems such as Jenkins or AWS CodePipeline.
Nice‑to‑have skills
- Experience architecting and deploying serverless cloud applications.
- Knowledge of IaC tools such as Terraform or CloudFormation.
- Previous participation in production on‑call rotations and incident management processes.
- Expertise optimizing AWS services like Lambda, DynamoDB, API Gateway, SQS, EventBridge, and EC2.
- Experience supporting systems with frequent deployment cycles in fast‑paced environments.
- Familiarity with security compliance frameworks such as OWASP, ISO, CSA, or PCI.
- Background in threat modeling, penetration testing, or security auditing.
- Knowledge of advanced deployment patterns (canary, blue/green, A/B testing, red/line).
- Hands‑on experience with chaos engineering practices.
- Proven ability to champion reliability culture and operational excellence.
Experience
4 to 6+ years
Education
Degree in Computer Science or Information Technology
Work mode
Remote/Hybrid
Office hours
1 pm to 9 pm IST
Benefits
- Flexible working hours supporting work–life balance.
- Opportunity to innovate and experiment with new technologies and tools.
- Collaborative, global, and low‑bureaucracy engineering environment.
- International exposure working with modern cloud‑native media technologies.
- Professional development opportunities including mentoring and educational support.
- Competitive compensation and comprehensive benefits package.