Scaling the Utilization Wall: The Case for Massively Heterogeneous Multiprocessors

Ikkjin Ahn, Nathan Goundling, John Sampson, Ganesh Venkatesh, Michael Taylor and Steve Swanson
CS2009-0947
September 3, 2009

Multi-core processors have emerged as the leading solution to the power and scalability concerns that processor designers currently face. This transition addresses microarchitectural scalability issues, but it only delays the onset of the power scalability problem. Due to limitations on threshold voltage scaling, in a few process generations, processors will only be able to make use of a small fraction of a silicon die at full frequency at once. This “utilization wall” will prevent massively multi-core processors from effectively employing more than a small subset of cores at once. If we cannot utilize the full array of homogeneous cores, then the utility of building them comes into question. This paper explores massively heterogeneous CMPs, an approach to processor design that can continue to scale performance in spite of the utilization wall. Such designs will comprise 10s to 100s to even 1000s of heterogeneous specialized processing elements (SPEs), ranging from small ASIC circuits to large speculative out-of-order general purpose processors. Massively heterogeneous CMPs combine these SPEs with an execution model that allows each part of a program to run on the SPE that can execute it most efficiently. Although the utilization wall dictates that massively heterogeneous CMPs (like all future processors) may use only a small fraction of the die at once, it uses that fraction very efficiently. This paper explores the architectural challenges that arise in designing general-purpose massively heterogeneous CMPs. Our results demonstrate that massively heterogeneous systems can extend performance scaling by realizing large gains (up to 7×) in performance and efficiency relative to more modestly heterogeneous and homogeneous designs. The paper also presents an ASIC-based SPE case study that demonstrates the ability of such systems to provide large efficiency gains even for irregular integer applications.


How to view this document


The authors of these documents have submitted their reports to this technical report series for the purpose of non-commercial dissemination of scientific work. The reports are copyrighted by the authors, and their existence in electronic format does not imply that the authors have relinquished any rights. You may copy a report for scholarly, non-commercial purposes, such as research or instruction, provided that you agree to respect the author's copyright. For information concerning the use of this document for other than research or instructional purposes, contact the authors. Other information concerning this technical report series can be obtained from the Computer Science and Engineering Department at the University of California at San Diego, techreports@cs.ucsd.edu.


[ Search ]


NCSTRL
This server operates at UCSD Computer Science and Engineering.
Send email to webmaster@cs.ucsd.edu