Briefly: GPUs have reminiscence limitations when dealing with the calls for of AI and HPC functions. There are methods round this bottleneck, however the options could be costly and cumbersome. Now, a startup headquartered in Daejeon, South Korea, has developed a brand new method: utilizing PCIe-attached reminiscence to increase capability. Growing this answer required leaping by way of many tech hoops and there are nonetheless challenges forward. Specifically, will AMD, Intel, and Nvidia assist the expertise?
Reminiscence necessities stemming from superior datasets for AI and HPC functions usually swamp the reminiscence constructed right into a GPU. Increasing that reminiscence has sometimes meant putting in costly excessive bandwidth reminiscence, which frequently introduces modifications to the prevailing GPU structure or software program.
One answer to this bottleneck is being provided by Panmnesia, an organization backed by South Korea’s KAIST analysis institute, which has launched new tech that permits GPUs to entry system reminiscence immediately by way of a Compute Categorical Hyperlink (CXL) interface. Primarily, it allows GPUs to make use of system reminiscence as an extension of their very own reminiscence.
Known as CXL GPU Picture, this PCIe-attached reminiscence has a double-digit nanosecond latency that’s considerably quicker than conventional SSDs, the corporate says.
Panmnesia needed to overcome a number of tech challenges to develop this technique.
CXL is a protocol that works on high of a PCIe hyperlink, however the expertise must be acknowledged by an ASIC and its subsystem. In different phrases, one can not merely add a CXL controller to the tech stack as there is no such thing as a CXL logic cloth and subsystems that assist DRAM and/or SSD endpoints in GPUs.
Additionally, GPU cache and reminiscence subsystems don’t acknowledge any expansions besides unified digital reminiscence (UVM), which isn’t quick sufficient for AI or HPC. In assessments by Panmnesia, UVM carried out the worst amongst all examined GPU kernels. The CXL, nevertheless, supplied direct entry to expanded storage by way of load/retailer directions, eliminating the problems hampering UVM reminiscent of overhead from host runtime intervention throughout web page faults and transferring knowledge on the web page stage.
What Panmnesia developed in response is a collection of {hardware} layers that assist all the key CXL protocols, consolidating them right into a unified controller.
The CXL 3.1-compliant root advanced has a number of root ports supporting exterior reminiscence over PCIe and a number bridge with a host-managed gadget reminiscence decoder that connects to the GPU’s system bus and manages the system reminiscence.
There are different challenges that Panmnesia is dealing with that transcend its management, a giant one being that AMD and Nvidia should add CXL assist to their GPUs. It’s doable that business gamers determine they just like the method of utilizing PCIe-attached reminiscence for GPUs – and go on to develop their very own expertise.