Recently, there has been extensive research that generated a wealth of new XML full-text query languages, ranging from simple Boolean search to combining sophisticated proximity and order predicates on keywords. While computing least common ancestors of query terms was proposed for efficient evaluation of conjunctive keyword queries by exploiting the document structure, no such solution was developed to evaluate complex full-text queries. We present efficient evaluation algorithms based on a formalization of XML queries in terms of keyword patterns and an algebra which manipulates pattern matches. Our algebra captures most existing languages and their varying semantics and our algorithms combine relational query evaluation techniques with the exploitation of document structure to process queries with complex full-text predicates. We show how scoring can be incorporated into our framework without compromising the algorithms complexity. Our experiments show that considering element nesting dramatically improves the performance of queries with complex full-text predicates.
The authors of these documents have submitted their reports to this technical report series for the purpose of non-commercial dissemination of scientific work. The reports are copyrighted by the authors, and their existence in electronic format does not imply that the authors have relinquished any rights. You may copy a report for scholarly, non-commercial purposes, such as research or instruction, provided that you agree to respect the author's copyright. For information concerning the use of this document for other than research or instructional purposes, contact the authors. Other information concerning this technical report series can be obtained from the Computer Science and Engineering Department at the University of California at San Diego, firstname.lastname@example.org.
[ Search ]