Enable job alerts via email!
Boost your interview chances
A leading company is seeking a skilled professional for AWS Cloud Reliability & Infrastructure Automation. The role involves designing resilient AWS systems, automating resources, managing Kubernetes, and ensuring security compliance. Ideal candidates will possess strong DevOps skills and experience in infrastructure automation, contributing to enhanced platform stability and reliability.
AWS Cloud Reliability & Infrastructure Automation:
- Design and maintain highly available, fault-tolerant AWS cloud infrastructure for customer data systems
- Automate AWS resource provisioning using Terraform and AWS CloudFormation
- Manage Kubernetes (EKS) clusters for containerized workloads and ensure autoscaling
- Optimize CI/CD pipelines in Jenkins and AWS CodePipeline for faster and reliable deployments
Monitoring, Performance & Incident Response:
- Implement real-time monitoring, logging, and alerting using DataDog, AWS CloudWatch, and Prometheus
- Define and track SLOs, SLIs, and error budgets to measure and improve AWS system reliability
- Conduct Root Cause Analysis (RCA) and post-mortems for incidents
Security, Compliance & API Reliability:
- Ensure GDPR, CCPA, and AWS security compliance in customer data storage and processing
- Implement AWS security best practices (IAM, Cognito, KMS, Shield, WAF) to protect user data
- Secure AWS infrastructure by configuring network security, VPCs, and automated security audits
Collaboration & Knowledge Sharing:
- Work closely with data engineers, marketing teams, and product managers to enhance platform stability
- Participate in Agile development, sprint planning, and technical documentation
- Mentor junior engineers and advocate for AWS SRE best practices across teams