Microarchitectural simulation of multithreaded architectures with shared resources, such as simultaneous multithreading (SMT) cores and multi-core processors with shared caches, is time-consuming and the results of simulation may be dicult to interpret. It is time-consuming because modern benchmarks run for hundreds of billions (or even trillions) of instructions, and accurate multi-core and SMT simulation requires higher-detail models than single-threaded simulation. The statistics collected when two programs execute together can be dicult to interpret because the programs both exhibit independent phase behavior and affect each other's execution. Starting one program slightly later than during the original execution will change the phases that execute together and thus change the eects that the programs have on each other.
Accurate sampled simulation requires accurate sample collection. We evaluate techniques to improve sampling accuracy and performance, both for single-threaded and multithreaded simulation. These techniques include warming the CPU with detailed execution, storing cache state and techniques to minimize the size of checkpoints. Previous work showed that single-program performance can be accurately estimated by dividing execution into phases and only simulating representative samples from each phase. We demonstrate that the juxtaposition of phases (`co-phase') from a pair of programs has similar behavior to a single-threaded phase. Furthermore, simulation of all possible co-phases allows analysis of all distinct SMT behaviors and this comprehensive knowledge of program interactions can be combined with information about the sequence of phases executed by each program to reconstruct the combined execution of the programs from any given starting point. Given the short samples, the set of executions from all possible starting oof the programs. This removes the problem of interpreting the results of small numbers of experiments.
Finally, we propose three techniques for using the co-phase techniques to summarize the behavior of all possible interactions within a suite of benchmarks. We reduce the scale of this problem using Priciple Components Analysis, allowing our techniques to scale to large numbers of benchmarks an concentrate simulation on the most signicant behaviors.
The authors of these documents have submitted their reports to this technical report series for the purpose of non-commercial dissemination of scientific work. The reports are copyrighted by the authors, and their existence in electronic format does not imply that the authors have relinquished any rights. You may copy a report for scholarly, non-commercial purposes, such as research or instruction, provided that you agree to respect the author's copyright. For information concerning the use of this document for other than research or instructional purposes, contact the authors. Other information concerning this technical report series can be obtained from the Computer Science and Engineering Department at the University of California at San Diego, firstname.lastname@example.org.
[ Search ]