Title | 3D Die-Stacking: Challenges and Opportunities for Computer Architecture |
Author | *Gabriel H. Loh (Advanced Micro Devices, U.S.A.) |
Page | p. 355 |
Abstract | Three-dimensional die-stacking technologies are rapidly maturing, with intense research and development happening in the areas of manufacturing, EDA/CAD, test and yield improvement. The computer architecture research area is also starting to show great interest in 3D technology, and there are many opportunities and challenges. A first obvious direction for 3D integration is the incorporation of memory technologies alongside the processor. Even for this seemingly simple approach, many research and practical questions remain open. The large number of through-silicon vias can provide a high-bandwidth die-stacked memory interface, and there are many options for how this bandwidth may be utilized, such as providing many independent channels, very wide channels, more sophisticated command interfaces, etc. Modern high-performance processor microarchitectures have been designed to cope with a relatively low-bandwidth, high-latency memory interface, employing techniques such as speculative execution, out-of-order execution, hardware prefetching, etc. In a system employing die-stacked memories, such aggressive (and power hungry!) techniques may not be necessary, or at least could be significantly scaled back. There are many research opportunities for better optimizing processor pipelines to match the bandwidth of stacked memories, possibly even to the point of designing entirely new microarchitectures. Beyond the stacking of memory on processors, 3D integration also introduces the opportunity for new compute organizations. In particular, conventional commodity processors can be combined with a variety of specialized accelerators and application-specific processing or other circuitry. Conventional co-processor organizations employ coarse task partitioning due to the relatively limited bandwidth between a host processor and the co-processor; fine-grained task partitioning requires frequent communications which would eliminate performance benefits. 3D stacking can allow very tight cooperation between the processor and ASICs, reconfigurable logic/FPGAs, or any variety of custom-built accelerators (e.g., for signal processing, analog computing, machine learning, string matching/pattern recognition). All of these approaches lead to new and exciting compute platforms with the potential to greatly increase performance as well as performance-per-Watt. |