This paper investigates the idle time associated with a parallel computation, that is, the time that processors are idle because they are either waiting for data from other processors or waiting to synchronize with other processors. We study doubly-nested loops corresponding to parallelogram- or trapezoidal-shaped iteration spaces that have been parallelized by the well-known tiling transformation. We introduce the notion of rise r, which relates the shape of the iteration space to that of the tiles. For parallelogram-shaped iteration spaces, we show that when r -2, the idle time is linear in P, the number of processors, but when r -1, it is quadratic in P. In the context of hierarchical tiling, where multiple levels of tiling are used, a good choice of rise can lead to less idle time and better performance. While idle time is not the only cost that should be considered in evaluating a tiling strategy, current architectural trends (of deeper memory hierarchies and multiple levels of parallelism) suggest it has increasing importance.
The authors of these documents have submitted their reports to this technical report series for the purpose of non-commercial dissemination of scientific work. The reports are copyrighted by the authors, and their existence in electronic format does not imply that the authors have relinquished any rights. You may copy a report for scholarly, non-commercial purposes, such as research or instruction, provided that you agree to respect the author's copyright. For information concerning the use of this document for other than research or instructional purposes, contact the authors. Other information concerning this technical report series can be obtained from the Computer Science and Engineering Department at the University of California at San Diego, firstname.lastname@example.org.
[ Search ]