Enable job alerts via email!

Site Reliability Engineer

Infotree Global Solutions

Wes-Kaap

On-site

USD 40,000 - 80,000

Full time

10 days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading company is seeking a skilled professional for AWS Cloud Reliability & Infrastructure Automation. The role involves designing resilient AWS systems, automating resources, managing Kubernetes, and ensuring security compliance. Ideal candidates will possess strong DevOps skills and experience in infrastructure automation, contributing to enhanced platform stability and reliability.

Qualifications

  • Experience with AWS infrastructure and automation tools.
  • Knowledge of security compliance measures (GDPR, CCPA).
  • Strong background in monitoring tools and incident response.

Responsibilities

  • Design and maintain AWS cloud infrastructure.
  • Automate resource provisioning and manage Kubernetes clusters.
  • Implement monitoring and conduct incidents analysis.

Skills

AWS CloudFormation
Terraform
Kubernetes
CI/CD
Security Compliance

Tools

DataDog
AWS CodePipeline
Prometheus

Job description

AWS Cloud Reliability & Infrastructure Automation:

- Design and maintain highly available, fault-tolerant AWS cloud infrastructure for customer data systems

- Automate AWS resource provisioning using Terraform and AWS CloudFormation

- Manage Kubernetes (EKS) clusters for containerized workloads and ensure autoscaling

- Optimize CI/CD pipelines in Jenkins and AWS CodePipeline for faster and reliable deployments

Monitoring, Performance & Incident Response:

- Implement real-time monitoring, logging, and alerting using DataDog, AWS CloudWatch, and Prometheus

- Define and track SLOs, SLIs, and error budgets to measure and improve AWS system reliability

- Conduct Root Cause Analysis (RCA) and post-mortems for incidents

Security, Compliance & API Reliability:

- Ensure GDPR, CCPA, and AWS security compliance in customer data storage and processing

- Implement AWS security best practices (IAM, Cognito, KMS, Shield, WAF) to protect user data

- Secure AWS infrastructure by configuring network security, VPCs, and automated security audits

Collaboration & Knowledge Sharing:

- Work closely with data engineers, marketing teams, and product managers to enhance platform stability

- Participate in Agile development, sprint planning, and technical documentation

- Mentor junior engineers and advocate for AWS SRE best practices across teams

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.