A leading AI startup in Kuala Lumpur is seeking a Cloud/DevOps Engineer to architect and manage scalable and secure infrastructure on cloud platforms such as GCP and Azure. The ideal candidate should have proficiency in scripting and strong expertise in Terraform and Kubernetes. This role requires implementing CI/CD workflows and ensuring effective cloud security practices. Competitive compensation and opportunities for growth are provided.
Qualifications
Strong expertise in managing infrastructure on GCP, Azure, and occasionally OCI/AWS.
Ability to design and optimize CI/CD workflows.
Experience with managing Docker containers and implementing disaster recovery plans.
Responsabilités
Architect and manage scalable, secure infrastructure.
Implement and manage Infrastructure as Code primarily using Terraform.
Ensure seamless deployment pipelines for microservices.
Connaissances
Scripting proficiency (Python, Bash, PowerShell)
Terraform
Kubernetes
Docker
CI/CD workflows
Cloud security best practices
Outils
GitHub Actions
Jenkins
Cloudflare
Grafana
Prometheus
Description du poste
Overview
Cloud/DevOps Engineer with scripting proficiency (e.g., Python, Bash, or PowerShell); Go/Rust is a plus. Strong expertise in Terraform, Terragrunt, Helm, Kubernetes, and Docker.
About Company
Groundup.ai is a Singapore-based AI startup that helps companies reduce unplanned downtime of industrial assets without needing a huge learning curve or high-risk deployments on the ground.
Responsibilities
Architect and manage scalable, secure infrastructure on GCP, Azure, and occasionally OCI/AWS.
Implement and manage Infrastructure as Code (IaC) primarily using Terraform and occasionally with Terragrunt and Helm.
Design and optimize CI/CD workflows using GitHub Actions, Jenkins, and GitHub Enterprise (reusable workflows, OIDC federation).
Ensure seamless deployment pipelines from code commit to production for microservices and AI workloads.
Manage Docker containers using tools such as Portainer and Docker images; support canary releases, blue-green deployments, and auto-scaling strategies.
Implement and manage serverless deployments on Google Cloud Platform (Cloud Functions, Cloud Run).
Resource planning and hardware estimation for both on-premise and cloud environments (based on sensors, storage needs).
Ensure robust backup strategies and data redundancy; audit on-cloud and on-premises resources.
Security & compliance: enforce cloud security best practices (image hardening, secret management, IAM least privilege, SBOMs, vulnerability scanning) and collaborate on SOC 2 / ISO 27001 requirements; respond to audits and incidents proactively.
Configure and manage Cloudflare for security and performance; build and maintain observability stacks (Grafana, Prometheus, Loki, Tempo, Datadog, OpenTelemetry, Sentry).
Diagnose and resolve performance bottlenecks across compute, storage, and networking layers; monitor and optimize cloud spending for cost-efficiency.
Develop and implement disaster recovery plans with regular drills to ensure business continuity.
Partner with engineers to embed DevOps best practices and establish documentation standards for infrastructure, processes, and troubleshooting guides.
Use Plane for sprint planning, incident tracking, and delivery visibility.
* Le salaire de référence se base sur les salaires cibles des leaders du marché dans leurs secteurs correspondants. Il vise à servir de guide pour aider les membres Premium à évaluer les postes vacants et contribuer aux négociations salariales. Le salaire de référence n’est pas fourni directement par l’entreprise et peut pourrait être beaucoup plus élevé ou plus bas.