JOB DESCRIPTION
Cloud/DevOps Engineer role requiring proficiency in scripting languages such as Python, Bash, PowerShell, and optionally Go or Rust. Strong expertise in Terraform, Terragrunt, Helm, Kubernetes, and Docker.
Responsibilities
- Architect and manage scalable, secure infrastructure on GCP, Azure, and occasionally OCI/AWS.
- Implement and manage Infrastructure as Code (IaC) primarily using Terraform and occasionally with Terragrunt, and Helm.
- Design and optimize CI/CD workflows using GitHub Actions, Jenkins, and GitHub Enterprise (reusable workflows, OIDC federation).
- Ensure seamless deployment pipelines from code commit to production for microservices and AI workloads.
- Manage Docker containers using tools such as Portainer and Docker Image.
- Support canary releases, blue‑green deployments, and auto‑scaling strategies.
- Implement and manage serverless deployments on Google Cloud Platform (Cloud Functions, Cloud Run).
Resource Planning & Hardware Estimation
- Assist in hardware estimation for both on‑premise and cloud environments based on resource requirements such as the number of sensors and storage needs.
- Ensure robust backup strategies and data redundancy for all infrastructure.
- Assist the team in auditing the on‑cloud and on‑premises resources.
Security & Compliance
- Enforce cloud security best practices: image hardening, secret management, IAM least privilege, SBOMs, and vulnerability scanning.
- Collaborate on compliance requirements (SOC 2, ISO 27001) and respond to audits and incidents proactively.
- Configure and manage Cloudflare for enhanced security and performance.
- Build and maintain observability stacks using Grafana, Prometheus, Loki, Tempo, Datadog, OpenTelemetry, and Sentry.
- Diagnose and resolve performance bottlenecks across compute, storage, and networking layers.
- Monitor and optimize cloud spending to ensure cost‑efficiency.
- Develop and implement disaster recovery plans, conducting regular drills to ensure business continuity.
- Partner with engineers to embed DevOps best practices.
- Establish and enforce documentation standards for infrastructure, processes, and troubleshooting guides.
- Use Plane for sprint planning, incident tracking, and delivery visibility.