The MI350 is a gigantic chiplet-based AI GPU that consists of stacked silicon. There are two base tiles called I/O dies (IODs), each built on the 6 nm TSMC N6 process. This tile has microscopic wiring for up to four Accelerator Compute Die (XCD) tiles stacked on top, besides the 128-channel HBM3E memory controllers, 256 MB of Infinity Cache memory, the Infinity Fabric interfaces, and a PCI-Express 5.0 x16 root complex. The XCDs are built on the 3 nm TSMC N3P foundry node. These contain a 4 MB L2 cache, and four shader engines, each with 9 compute units. Each XCD hence has 36 CU, and each IOD seats 144 CU. Two IODs are joined at the hip by a 5.5 TB/s bidirectional interconnect that enables full cache coherency among the two IODs. The package has a total of 288 CU. Each IOD controls four HBM3E stacks for 144 GB of memory, the package has 288 GB.
While the MI350 with its 288 CU and 288 GB of memory can function like a single GPU, AMD innovated ways for the GPU and its physical memory to be partitioned in many ways, along the IODs, and along the XCDs.
At the platform level, each blade supports up to eight MI350 series GPUs, with memory pools enabled across a point-to-point network of 153.6 GB/s links connecting each package with every other package on the node. Besides these, each package has a PCI-Express 5.0 x16 link to one of the node’s two EPYC “Turin” processors handling serial processing.