Summary:

The evolution of the semiconductor industry in the past two decades has been significantly influenced by the 3D integration of silicon dice. Notably, Through-Silicon-Via (TSV) technology stands out as a pivotal development, allowing different types of compute and memory components to be stacked. This stacking has resulted in commercially viable products such as High-Bandwidth Memory (HBM), Hybrid Memory Cube (HMC), and WideIO.

One of the prominent advantages of 3D memory stacking is a tenfold improvement in bandwidth compared to traditional models. Moreover, reduced wire lengths mean lower power consumption. Despite the advantages, there are evident challenges. HBM-integrated GPUs, while offering higher bandwidth, are constrained by memory capacity. The escalating demand for more device memory, particularly for emerging workloads like Deep Learning, further exacerbates this limitation. Increasing the number of HBM modules could address the memory capacity issue, but at the cost of significantly larger package sizes.

Recent research has considered vertically stacking 3D DRAM on top of GPUs as a solution. Combining both 3D and 2.5D stacked DRAMs promises greater memory capacity and bandwidth. However, a substantial concern is the heat generated by GPUs. This heat leads to varied temperatures across different layers of the 3D stacked DRAM, causing inconsistencies in data retention times. This temperature variability can impact the performance, especially when multiple applications run simultaneously on a virtualized GPU, a common scenario in cloud computing.

Although existing system software offers support for data placement based on non-uniform access latency (NUMA), this support does not account for the new thermal-induced latency differences in vertically stacked DRAMs. This paper introduces “Data Convection”, a novel approach to tackle this issue.

Key Contributions:

  1. The paper unveils a unique thermal-induced NUMA paradigm in systems where 3D stacked DRAMs are vertically aligned with GPUs.
  2. Detailed simulations reveal this NUMA behavior’s prevalence in 3D stacked DRAMs.
  3. A critical observation highlights the need for GPU runtime systems to be cognizant of thermal-induced NUMA behavior. Ignoring this behavior can have a detrimental effect on performance and energy.
  4. The proposed “Data Convection” technique effectively balances data placement, ensuring optimal performance in thermally diverse conditions. Experimental results indicate performance improvements ranging from 1.8% to 14.4%, based on different implementations of this technique.

In essence, this study illuminates the challenges and potential solutions in the realm of 3D DRAM data placement, especially in the context of thermally-induced variabilities.

You can access this paper using this link.