Enable job alerts via email!

Staff Architect, AI Infrastructure

Support Revolution

San Jose (CA)

On-site

USD 168,000 - 184,000

Full time

2 days ago
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading technology company is seeking a Staff Architect for AI Infrastructure in San Jose. The role involves designing and scaling GPU-accelerated infrastructure for AI and machine learning workloads, requiring deep system-level expertise and hands-on experience. The successful candidate will collaborate with cross-functional teams to ensure operational efficiency and future readiness while managing high-density GPU workloads.

Qualifications

  • 10+ years in data center infrastructure or hyperscaler-scale compute environments.
  • Hands-on experience with large-scale data center deployments.

Responsibilities

  • Design and scale high-performance infrastructure inspired by hyperscalers.
  • Lead the integration of compute, networking, storage, and power systems.
  • Build and standardize infrastructure provisioning using infrastructure-as-code tools.

Skills

Automation
System-Level Expertise
Collaboration
Business Acumen

Education

Bachelor's degree

Tools

Terraform
Ansible
Python

Job description

Select how often (in days) to receive an alert: Create Alert

Location: San Jose, California, United States

About Supermicro:

Supermicro is a Top Tier provider of advanced server, storage, and networking solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop/ Big Data, Hyperscale, HPC and IoT/Embedded customers worldwide. We are the #5 fastest growing company among the Silicon Valley Top 50 technology firms. Our unprecedented global expansion has provided us with the opportunity to offer a large number of new positions to the technology community. We seek talented, passionate, and committed engineers, technologists, and business leaders to join us.

Job Summary:

Supermicro IT team is seeking a visionary Staff Architect, AI Infrastructure to lead the architecture and scaling of GPU-accelerated infrastructure optimized for AI and machine learning workloads. This role requires deep system-level expertise, automation, and hands-on experience designing infrastructure at scale. You will architect integrated compute, network, and cooling systems that support next-generation AI platforms while ensuring operational efficiency and future readiness.

Essential Duties and Responsibilities:
  • Hyperscaler-Grade Infrastructure Design
    Design and scale high-performance infrastructure inspired by hyperscalers (e.g., NVIDIA DGX SuperPOD, Meta RSC, Azure NDv5, AWS Trainium clusters), with a focus on modularity, density, and operability.
  • System-Level Architecture
    Lead the integration of compute, networking, storage, and power systems for high-density GPU workloads (NVIDIA, AMD, Intel Gaudi), ensuring system-wide performance optimization.
  • Automation & Orchestration
    Build and standardize infrastructure provisioning, deployment, and monitoring via infrastructure-as-code tools (Terraform, Ansible, Python), ensuring repeatability and scale.
  • AI-Ready Network Design
    Architect East-West GPU interconnects and North-South data ingress/egress paths using InfiniBand (HDR/NDR) and high-speed Ethernet (100G/400G), with support for VXLAN, BGP, and EVPN.
  • Liquid & Air Cooling Infrastructure
    Design and oversee deployment of air- and liquid-cooled racks, PDUs, containment solutions, and backup power systems tailored for thermally intensive AI workloads.
  • Observability & Monitoring
    Implement telemetry and health metrics to proactively manage system performance and lifecycle states.
  • Infrastructure Documentation & Standards
    Create robust documentation for reference architectures, operational playbooks, and lifecycle workflows to support global deployments.
  • Cross-Functional Leadership
    Collaborate with ML platform teams, data scientists, hardware architects, and facility engineers to align infrastructure capabilities with AI platform needs.
  • Technology & Market Evaluation
    Analyze and influence roadmap decisions by staying current on industry trends from NVIDIA, AMD, Intel, and cloud hyperscalers.
Qualifications:
  • 10+ years in data center infrastructure or hyperscaler-scale compute environments, ideally with AI or HPC workloads
  • Bachelor's degree or equivalent experience
  • Proven success architecting GPU infrastructure using NVIDIA, AMD, or Intel Gaudi platforms
  • Hands-on experience with large-scale data center deployments, including mechanical/electrical design and containment
  • Deep knowledge of RDMA, InfiniBand, Ethernet,and overlay networks
  • Experience with bare-metal orchestration for GPU environments
  • Experience with hyperscaler environments or colocation data centers supporting AI workloads
  • Experience supporting AI/ML workloads across hybrid cloud environments
  • Strong business acumen: able to balance performance, cost, and scalability in architecture decisions
Salary Range

$168,000 - $184,000

The salary offered will depend on several factors, including your location, level, education, training, specific skills, years of experience, and comparison to other employees already in this role. In addition to a comprehensive benefits package, candidates may be eligible for other forms of compensation, such as participation in bonus and equity award programs.

EEO Statement

Supermicro is an Equal Opportunity Employer and embraces diversity in our employee population. It is the policy of Supermicro to provide equal opportunity to all qualified applicants and employees without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, protected veteran status or special disabled veteran, marital status, pregnancy, genetic information, or any other legally protected status.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

DevOps Cloud Architect, Digital Engineering Solutions (Remote)

Lensa

San Jose

Remote

USD 121,000 - 213,000

Yesterday
Be an early applicant

Software Architect

Javelin

San Francisco

Remote

USD 100,000 - 250,000

3 days ago
Be an early applicant

Senior Principal Cloud Architect

Autodesk

San Francisco

Remote

USD 159,000 - 258,000

Yesterday
Be an early applicant

Staff Architect, AI Infrastructure

Support Revolution

San Jose

On-site

USD 168,000 - 184,000

Yesterday
Be an early applicant

Cloud Solutions Architect - Alliances

Canonical

San Francisco

Remote

USD 119,000 - 180,000

3 days ago
Be an early applicant

Senior Principal Enterprise Portfolio Architect - State Government Solutions - Remote

Lensa

Springfield

Remote

USD 143,000 - 243,000

Today
Be an early applicant

Staff Architect, AI Infrastructure

Supermicro

California

On-site

USD 168,000 - 184,000

-1 days ago
Be an early applicant

Principal Architect – Enterprise AI / ML

DDN

Illinois

Remote

USD 150,000 - 200,000

2 days ago
Be an early applicant

RevOps Architect (Remote, US, Canada or LATAM)

Go Nimbly

San Francisco

Remote

USD 140,000 - 180,000

3 days ago
Be an early applicant