Job Title: Site Reliability Engineer
Job type: 12 months Contract
Location: Remote
Job Description:
- Serve as a technical lead on projects with a cloud-first mindset. Design infrastructure solutions for the future.
- Collaborate as a member of our operational review board with other senior SRE members to establish proper refinement of project initiatives, both internal and cross-team.
- Mentor team members, providing guidance and support as the senior you needed when you were a junior engineer.
- Automate processes to minimize human intervention.
- Build and maintain production infrastructure hosted on AWS using code.
- Design and develop pipelines to scale, deploy, and manage global infrastructure.
- Analyze complex system behavior, performance, and application issues.
- Develop observability tools, alerts, and runbooks.
- Perform capacity analysis and planning, traffic routing, and security policy implementation for SaaS applications.
- This role involves an on-call rotation of 8 hours, 7 days a week, following the sun model.
Minimum Qualifications:
- 7+ years of experience with Amazon Web Services (AWS).
- Experience provisioning large cloud environments using Infrastructure as Code (IaC) tools such as CloudFormation and Terraform.
- Experience with CI/CD automation tools like Jenkins, GitLab CI/CD.
- Experience with container orchestration (Kubernetes) and microservices architectures.
- Strong experience with configuration management tools such as Puppet, Chef, or Salt.
- Experience building pipelines using GitOps methodologies and ChatOps.
- Experience with observability tools such as New Relic, Grafana, and CloudWatch.
- 7+ years of experience with Linux/UNIX systems administration.
- Knowledge of security design principles and their implementation in automation workflows.
- Proficiency in scripting languages such as Python, Ruby, Go, or Bash.
- Networking experience in large cloud environments.
- Experience working in high-volume or critical production service environments.