Staff Architect, AI Infrastructure
San Jose, California, United States
Full Time Senior-level / Expert USD 168K - 184K
Supermicro
The premier provider of advanced Server Building Block Solutions® for 5G/Edge, Data Center, Cloud, Enterprise, Big Data, HPC and Embedded markets worldwide.Job Req ID: 26676
About Supermicro:
Supermicro® is a Top Tier provider of advanced server, storage, and networking solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop/ Big Data, Hyperscale, HPC and IoT/Embedded customers worldwide. We are the #5 fastest growing company among the Silicon Valley Top 50 technology firms. Our unprecedented global expansion has provided us with the opportunity to offer a large number of new positions to the technology community. We seek talented, passionate, and committed engineers, technologists, and business leaders to join us.
Job Summary:
Supermicro IT team is seeking a visionary Staff Architect, AI Infrastructure to lead the architecture and scaling of GPU-accelerated infrastructure optimized for AI and machine learning workloads. This role requires deep system-level expertise, automation, and hands-on experience designing infrastructure at scale. You will architect integrated compute, network, and cooling systems that support next-generation AI platforms while ensuring operational efficiency and future readiness.
Essential Duties and Responsibilities:
- Hyperscaler-Grade Infrastructure Design
Design and scale high-performance infrastructure inspired by hyperscalers (e.g., NVIDIA DGX SuperPOD, Meta RSC, Azure NDv5, AWS Trainium clusters), with a focus on modularity, density, and operability. - System-Level Architecture
Lead the integration of compute, networking, storage, and power systems for high-density GPU workloads (NVIDIA, AMD, Intel Gaudi), ensuring system-wide performance optimization. - Automation & Orchestration
Build and standardize infrastructure provisioning, deployment, and monitoring via infrastructure-as-code tools (Terraform, Ansible, Python), ensuring repeatability and scale. - AI-Ready Network Design
Architect East-West GPU interconnects and North-South data ingress/egress paths using InfiniBand (HDR/NDR) and high-speed Ethernet (100G/400G), with support for VXLAN, BGP, and EVPN. - Liquid & Air Cooling Infrastructure
Design and oversee deployment of air- and liquid-cooled racks, PDUs, containment solutions, and backup power systems tailored for thermally intensive AI workloads. - Observability & Monitoring
Implement telemetry and health metrics to proactively manage system performance and lifecycle states. - Infrastructure Documentation & Standards
Create robust documentation for reference architectures, operational playbooks, and lifecycle workflows to support global deployments. - Cross-Functional Leadership
Collaborate with ML platform teams, data scientists, hardware architects, and facility engineers to align infrastructure capabilities with AI platform needs. - Technology & Market Evaluation
Analyze and influence roadmap decisions by staying current on industry trends from NVIDIA, AMD, Intel, and cloud hyperscalers.
Qualifications:
- 10+ years in data center infrastructure or hyperscaler-scale compute environments, ideally with AI or HPC workloads
- Bachelor's degree or equivalent experience
- Proven success architecting GPU infrastructure using NVIDIA, AMD, or Intel Gaudi platforms
- Hands-on experience with large-scale data center deployments, including mechanical/electrical design and containment
- Strong automation experience
- Deep knowledge of RDMA, InfiniBand, Ethernet,and overlay networks
- Experience with bare-metal orchestration for GPU environments
- Experience with hyperscaler environments or colocation data centers supporting AI workloads
- Experience supporting AI/ML workloads across hybrid cloud environments
- Strong business acumen: able to balance performance, cost, and scalability in architecture decisions
Salary Range
$168,000 - $184,000
The salary offered will depend on several factors, including your location, level, education, training, specific skills, years of experience, and comparison to other employees already in this role. In addition to a comprehensive benefits package, candidates may be eligible for other forms of compensation, such as participation in bonus and equity award programs.
EEO Statement
Supermicro is an Equal Opportunity Employer and embraces diversity in our employee population. It is the policy of Supermicro to provide equal opportunity to all qualified applicants and employees without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, protected veteran status or special disabled veteran, marital status, pregnancy, genetic information, or any other legally protected status.
Tags: Ansible Architecture AWS Azure Big Data GPU Hadoop HPC InfiniBand Machine Learning ML infrastructure Python Terraform
Perks/benefits: Career development Equity / stock options Health care Salary bonus
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.