Microsoft Introduces a Groundbreaking AI Compute Platform with NVIDIA Blackwell and AMD EPYC Genoa with HBM
Microsoft has unveiled transformative upgrades to its Azure AI compute infrastructure during the recent “Ignite” event. The tech giant revealed its integration of NVIDIA’s advanced Blackwell GPUs and AMD’s fourth-generation EPYC Genoa processors equipped with custom high-bandwidth memory (HBM). These enhancements are set to propel Microsoft to the forefront of AI innovation.
Expanding Azure AI Services with Revolutionary Architectures
Microsoft has consistently demonstrated leadership in delivering AI-driven solutions, leveraging its robust compute resources to secure an edge in the market. During the Ignite event, Microsoft showcased the Azure ND GB200 V6 VM series, its first virtual machine powered by NVIDIA’s Blackwell GPUs.
The Azure ND GB200 V6 VMs feature dual GB200 Grace Blackwell Superchips. Each Superchip integrates two high-performance Blackwell GPUs and a Grace CPU, seamlessly interconnected via NVIDIA’s NVLink. This configuration enables Microsoft to host up to 18 compute servers on a single platform, offering an astounding 72 NVIDIA Blackwell GPUs per system. Such scalability is further enhanced by NVIDIA’s InfiniBand fabric, ensuring unmatched efficiency.
High-Performance Azure HBv5 VMs for HPC Applications
In addition to GPU-powered solutions, Microsoft introduced the Azure HBv5 VM, a CPU-based virtual machine designed specifically for memory bandwidth-intensive high-performance computing (HPC) applications. Collaborating with AMD, Microsoft leverages the latest fourth-generation EPYC Genoa processors to offer unparalleled memory bandwidth and processing power.
The Azure HBv5 VM delivers the following specifications:
- Memory Bandwidth: Up to 6.9 TB/s across 400-450 GB of RAM with HBM3.
- Configurable Memory Per Core: Up to 9 GB of memory per core.
- Processing Power: Up to 352 AMD EPYC Zen 4 cores with peak frequencies of 4 GHz.
- Infinity Fabric Bandwidth: 2X higher compared to previous platforms.
- Networking: 800 Gb/s NVIDIA Quantum-2 InfiniBand, distributed as 200 Gb/s per CPU SoC.
- Azure Accelerated Networking: 160 Gbps via second-generation Azure Boost NIC.
- Storage: 14 TB of local NVMe SSD offering up to 50 GB/s read and 30 GB/s write speeds.
This new VM supports scaling of MPI workloads to hundreds of thousands of HBM-powered CPU cores through Azure VMSS Flex, enhancing flexibility for memory-intensive HPC tasks.
Unmatched Performance Enhancements
Microsoft reports a 20x performance improvement with the Azure HBv5 VMs compared to previous generations. This extraordinary leap showcases the dominance of AMD’s EPYC Genoa processors, which continue to outperform competitors like Intel in the server CPU market.
Revolutionizing AI Compute at Scale
By incorporating NVIDIA’s state-of-the-art GPUs and AMD’s cutting-edge CPUs, Microsoft is setting a new benchmark for AI compute platforms. These advancements ensure Azure remains a leading choice for businesses and researchers aiming to harness the full potential of artificial intelligence.
By Andrej Kovacevic
Updated on 21st November 2024