Wide-area systems are gaining in popularity as an infrastructure for running scientific applications. From a fault tolerance perspective, these environments are challenging due to their scale and their inherent variability. Causal message logging protocols have attractive properties that make them suitable for these environments. They spread fault tolerance information around in the system providing high availability. This information can also be used to replicate objects that are otherwise inaccessible due to network partitions. However, current causal message logging protocols do not scale to thousands or millions of processes. We describe the Hierarchical Causal Logging Protocol (HCML) that uses a hierarchy of shared logging sites, or proxies, to reduces the space requirements exponentially. These proxies also act as caches for fault tolerance information and reduce the overall message overhead of causal message logging protocols by as much as 50%. In addition, HCML leverages differences in bandwidth between communicating processes by piggybacking more fault tolerance information over high bandwidth links. Doing so improves overall message latency by as much as 97%.
The authors of these documents have submitted their reports to this technical report series for the purpose of non-commercial dissemination of scientific work. The reports are copyrighted by the authors, and their existence in electronic format does not imply that the authors have relinquished any rights. You may copy a report for scholarly, non-commercial purposes, such as research or instruction, provided that you agree to respect the author's copyright. For information concerning the use of this document for other than research or instructional purposes, contact the authors. Other information concerning this technical report series can be obtained from the Computer Science and Engineering Department at the University of California at San Diego, email@example.com.
[ Search ]