Compiler and Hardware Predicated Dependency Analysis and Scheduling

Lorinda Carter
CS2002-0700
March 18, 2002

The Explicitly Parallel Instruction Computing (EPIC) architecture has been put forth as a viable architecture for achieving the instruction level parallelism (ILP) needed to keep increasing future processor performance. The Itanium processor developed at Intel is an example of an EPIC architecture. One of the new features of the EPIC architecture is its support for predicated execution. Predicated execution is a process that can replace branches with statements defining 2 predicate registers (one true and one false), depending on the condition in the replaced branch. Subsequent statements are then guarded by one of the predicates, depending upon whether they would have been on the taken or fall-through path of the branch. All statements begin execution, but an operation is committed only if the value of its guarding predicate is true. An advantage of predicated execution is that it can eliminate hard-to-predict branches by combining both paths of a branch into a single path. However, data dependence analysis (for the purpose of maintaining definition-use information) is significantly more complex for the resulting code. When the two paths of a branch are combined, definitions of the same logical registers (originally from different paths) are intermingled. This makes it difficult to determine which definition a use is actually dependent on. This dissertation presents both hardware (Disjoint Path Analysis) and compiler (Predicated Static Single Assignment) solutions for improving the data dependence analysis for predicated regions of code by collecting information on predicate relationships. Another feature of the EPIC architecture is the reduced hardware complexity. The EPIC philosophy is that the compiler should handle most of the dependence analysis and scheduling in order to simplify the processor, and at the same time the compiler has a broader view of the code. However, the compiler cannot fully anticipate run-time events such as cache misses. Consquently, it cannot always create a static schedule to mitigate the effects of the increased latency that might result. In this dissertation, we introduce Pending Functional Units (PFU) which allow a limited amount of dynamic scheduling with minimal additional hardware overhead.


How to view this document


The authors of these documents have submitted their reports to this technical report series for the purpose of non-commercial dissemination of scientific work. The reports are copyrighted by the authors, and their existence in electronic format does not imply that the authors have relinquished any rights. You may copy a report for scholarly, non-commercial purposes, such as research or instruction, provided that you agree to respect the author's copyright. For information concerning the use of this document for other than research or instructional purposes, contact the authors. Other information concerning this technical report series can be obtained from the Computer Science and Engineering Department at the University of California at San Diego, techreports@cs.ucsd.edu.


[ Search ]


NCSTRL
This server operates at UCSD Computer Science and Engineering.
Send email to webmaster@cs.ucsd.edu