Job Search and Career Advice Platform

Enable job alerts via email!

Principal Cloud Architect / GPU / HPC / AI Infrastructure

Oracle

Hong Kong

On-site

HKD 150,000 - 200,000

Full time

15 days ago

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading cloud services provider is seeking a Principal Cloud Architect to design and deploy GPU and HPC infrastructure on its cloud platform in Hong Kong. The ideal candidate should possess over 5 years of experience in technical consulting and demonstrate expertise in AI solutions. Fluency in Mandarin is essential due to customer interactions, along with strong skills in automation tools like Terraform and Ansible. This role offers a unique opportunity to work at the forefront of AI in cloud computing.

Benefits

Competitive salary
Training opportunities
Innovative work environment

Qualifications

  • 5+ years in pre-sales, technical consulting, or solution architecture.
  • Strong hands-on experience with AI/ML platforms.
  • Proficiency with multiple scripting languages.

Responsibilities

  • Design & deploy GPU/HPC infrastructure on OCI.
  • Collaborate with customers to size and tune infrastructure.
  • Build AI-ready platforms and support LLM-based solutions.

Skills

GPU or HPC clusters
Python
Automation
Kubernetes
Ansible

Tools

Terraform
Slurm
PowerShell
Job description
Principal Cloud Architect – GPU / HPC / AI Infrastructure

Function: Pre-Sales / Solution Architecture – OCI Accelerated Computing & AI

About the Role

Are you excited by large-scale AI, GPU clusters and next-generation cloud infrastructure?

As a Principal Cloud Architect, you will be at the forefront of helping our customers design and implement accelerated computing and AI platforms on Oracle Cloud Infrastructure (OCI).

You will work directly with AI startups, digital-native unicorns, and strategic enterprise customers to architect and deploy:

  • Large-scale GPU and HPC clusters
  • LLM training and inference platforms
  • Agentic AI and intelligent automation solutions

This role blends deep technical hands‑on work with customer‑facing solution consulting. You will partner closely with sales, product and engineering teams to shape our customers’ AI journey and contribute to Oracle’s strategic vision for cloud and AI adoption in the region.

What You Will Do

In this role, you will:

  • Design & deploy GPU/HPC infrastructure on OCI
  • Architect large-scale GPU and HPC clusters on OCI (and hybrid environments)
  • Use Terraform, Ansible, Slurm, Kubernetes and related tooling to build repeatable, automated deployments
  • Define cluster architecture including node types, storage layout, networking, and security
  • Build AI‑ready platforms
  • Support LLM‑based solutions, agentic AI systems, and robotic / intelligent systems from proof‑of‑concept to production
  • Collaborate with customers to size and tune infrastructure for training and inference workloads
  • Implement best practices for performance, reliability, observability and cost optimization
  • Be a trusted technical advisor
  • Work with CTOs, Heads of AI, and senior engineering leaders to translate business problems into scalable AI/HPC architectures
  • Provide guidance on cloud migration, hybrid deployments, and reference architectures for GPU/HPC workloads
  • Lead technical workshops, design sessions, and deep‑deep discussions with customer engineering teams
  • Drive customer enablement and internal advocacy
  • Deliver training, hands‑on labs, and technical enablement on OCI AI/HPC capabilities
  • Create and share code samples, deployment blueprints, reference architectures, and demos
  • Contribute to blogs, whitepapers, best‑practice guides, or conference talks to showcase solutions and thought leadership
  • Influence product & roadmap
  • Collaborate with product and engineering teams to close technical gaps, relay customer feedback, and help shape the future of OCI accelerated computing
  • Work with key AI partners and ISVs to integrate their solutions into customer architectures
Core Technical Requirements

To be successful, you should bring strong hands‑on experience in most of the following areas:

  • Practical experience designing or operating GPU or HPC clusters (cloud and/or on‑prem)
  • Understanding of cluster topology, GPU/CPU ratios, storage bandwidth, and scaling
  • Automation & Infrastructure as Code
  • Proficiency with Python, Bash, or PowerShell for scripting and tooling
  • Hands‑on experience with Terraform and/or Ansible to automate infrastructure and cluster provisioning
  • Experience with cluster managers and schedulers such as Slurm, PBS, or Bright
  • Strong understanding of Kubernetes / container orchestration for AI or batch workloads
  • High‑Performance Networking
  • Knowledge of RDMA, InfiniBand, MPI, and distributed file systems used in HPC environments
  • Experience troubleshooting or optimizing network and I/O bottlenecks in distributed workloads
  • AI / ML Platform Experience
  • Familiarity with AI/ML platforms, LLMs, and inference serving stacks (e.g. distributed training frameworks, model serving patterns)
  • Understanding of GPU utilization, mixed precision, parallelism strategies (data/model/pipeline parallel)
Business & Leadership Skills
  • 5+ years in pre‑sales, technical consulting, customer‑facing solution architecture, or equivalent roles
  • Proven ability to present complex technical architectures to both deeply technical and senior business audiences
  • Strong skills in requirements discovery, solution design, and storytelling around value and outcomes
  • Comfortable leading design workshops, whiteboarding sessions, and technical decision‑making with customer engineering teams
  • Passion for working with top‑tier customers and partners to deliver innovative cloud and AI solutions
  • Ability to work independently with regional and global teams in a fast‑evolving AI/cloud landscape
Language & Location
  • Language:
  • Mandarin is mandatory, as many customers and partners in this role are Mandarin‑speaking.
  • English proficiency is also required for internal collaboration and regional stakeholders.
  • Location:
  • Based in Hong Kong, supporting customers across Greater China and regional markets as needed.
Preferred Qualifications
  • Demonstrated thought leadership in AI/HPC/cloud through:
  • Publications, conference talks, community contributions, or open‑source projects
  • Experience architecting or operating solutions on Oracle Cloud Infrastructure (OCI) or other major cloud platforms (AWS, Azure, GCP)
  • Prior experience in AI/HPC solution pre‑sales or working directly with digital‑native / AI‑first companies
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.