Enable job alerts via email!

Staff Architect, AI Infrastructure

Support Revolution

San Jose (CA)

On-site

USD 168,000 - 184,000

Full time

2 days ago

Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

A leading technology company is seeking a Staff Architect for AI Infrastructure in San Jose. The role involves designing and scaling GPU-accelerated infrastructure for AI and machine learning workloads, requiring deep system-level expertise and hands-on experience. The successful candidate will collaborate with cross-functional teams to ensure operational efficiency and future readiness while managing high-density GPU workloads.

Qualifications

10+ years in data center infrastructure or hyperscaler-scale compute environments.
Hands-on experience with large-scale data center deployments.

Responsibilities

Design and scale high-performance infrastructure inspired by hyperscalers.
Lead the integration of compute, networking, storage, and power systems.
Build and standardize infrastructure provisioning using infrastructure-as-code tools.

Skills

Automation

System-Level Expertise

Collaboration

Business Acumen

Education

Bachelor's degree

Tools

Terraform

Ansible

Python

Select how often (in days) to receive an alert: Create Alert

Location: San Jose, California, United States

About Supermicro:

Supermicro is a Top Tier provider of advanced server, storage, and networking solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop/ Big Data, Hyperscale, HPC and IoT/Embedded customers worldwide. We are the #5 fastest growing company among the Silicon Valley Top 50 technology firms. Our unprecedented global expansion has provided us with the opportunity to offer a large number of new positions to the technology community. We seek talented, passionate, and committed engineers, technologists, and business leaders to join us.

Job Summary:

Supermicro IT team is seeking a visionary Staff Architect, AI Infrastructure to lead the architecture and scaling of GPU-accelerated infrastructure optimized for AI and machine learning workloads. This role requires deep system-level expertise, automation, and hands-on experience designing infrastructure at scale. You will architect integrated compute, network, and cooling systems that support next-generation AI platforms while ensuring operational efficiency and future readiness.

Essential Duties and Responsibilities:

Hyperscaler-Grade Infrastructure Design
Design and scale high-performance infrastructure inspired by hyperscalers (e.g., NVIDIA DGX SuperPOD, Meta RSC, Azure NDv5, AWS Trainium clusters), with a focus on modularity, density, and operability.
System-Level Architecture
Lead the integration of compute, networking, storage, and power systems for high-density GPU workloads (NVIDIA, AMD, Intel Gaudi), ensuring system-wide performance optimization.
Automation & Orchestration
Build and standardize infrastructure provisioning, deployment, and monitoring via infrastructure-as-code tools (Terraform, Ansible, Python), ensuring repeatability and scale.
AI-Ready Network Design
Architect East-West GPU interconnects and North-South data ingress/egress paths using InfiniBand (HDR/NDR) and high-speed Ethernet (100G/400G), with support for VXLAN, BGP, and EVPN.
Liquid & Air Cooling Infrastructure
Design and oversee deployment of air- and liquid-cooled racks, PDUs, containment solutions, and backup power systems tailored for thermally intensive AI workloads.
Observability & Monitoring
Implement telemetry and health metrics to proactively manage system performance and lifecycle states.
Infrastructure Documentation & Standards
Create robust documentation for reference architectures, operational playbooks, and lifecycle workflows to support global deployments.
Cross-Functional Leadership
Collaborate with ML platform teams, data scientists, hardware architects, and facility engineers to align infrastructure capabilities with AI platform needs.
Technology & Market Evaluation
Analyze and influence roadmap decisions by staying current on industry trends from NVIDIA, AMD, Intel, and cloud hyperscalers.

Qualifications:

10+ years in data center infrastructure or hyperscaler-scale compute environments, ideally with AI or HPC workloads
Bachelor's degree or equivalent experience
Proven success architecting GPU infrastructure using NVIDIA, AMD, or Intel Gaudi platforms
Hands-on experience with large-scale data center deployments, including mechanical/electrical design and containment
Deep knowledge of RDMA, InfiniBand, Ethernet,and overlay networks
Experience with bare-metal orchestration for GPU environments
Experience with hyperscaler environments or colocation data centers supporting AI workloads
Experience supporting AI/ML workloads across hybrid cloud environments
Strong business acumen: able to balance performance, cost, and scalability in architecture decisions

Salary Range

$168,000 - $184,000

The salary offered will depend on several factors, including your location, level, education, training, specific skills, years of experience, and comparison to other employees already in this role. In addition to a comprehensive benefits package, candidates may be eligible for other forms of compensation, such as participation in bonus and equity award programs.

EEO Statement

Supermicro is an Equal Opportunity Employer and embraces diversity in our employee population. It is the policy of Supermicro to provide equal opportunity to all qualified applicants and employees without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, protected veteran status or special disabled veteran, marital status, pregnancy, genetic information, or any other legally protected status.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

DevOps Cloud Architect, Digital Engineering Solutions (Remote)

Lensa

San Jose

Remote

USD 121,000 - 213,000

Yesterday

Be an early applicant

Software Architect

Javelin

San Francisco

Remote

USD 100,000 - 250,000

3 days ago

Be an early applicant

Senior Principal Cloud Architect

Autodesk

San Francisco

Remote

USD 159,000 - 258,000

Yesterday

Be an early applicant