Docking topical hierarchies: A comparison of two algorithms for reconciling keyword structures

Bryan Tower, Mark Chaisson and Richard Belew
CS2001-0669
April 26, 2001

Hierarchies are a natural way for people to organize information, as reflected by the common use of ``broader/narrower'' term relation in keyword thesauri. However, different people and organizations tend to construct different conceptual hierarchies (e.g., contrast Yahoo! with the UseNet news hierarchy), and while there are often significant commonalities it is in general quite difficult to fully reconcile them. We are particularly interested in the problem of ``docking'' a narrower, more focused and refined topical hierarchy into a broader one, and describe two algorithms for accomplishing this task. The first matches hierarchies based on a bipartite matching algorithm of (textual) features of nodes without consideration of their hierarchic organization, and the second is based on an attributed tree matching algorithm which uses both hierarchic structure and node features. We present experimental results showing the performance of both algorithms on a set of very different topical hierarchies, all designed to represent the field of {\tt Computer\_Science}. These show that hierarchic structure does indeed allow more accurate matches than nodes alone.


How to view this document


The authors of these documents have submitted their reports to this technical report series for the purpose of non-commercial dissemination of scientific work. The reports are copyrighted by the authors, and their existence in electronic format does not imply that the authors have relinquished any rights. You may copy a report for scholarly, non-commercial purposes, such as research or instruction, provided that you agree to respect the author's copyright. For information concerning the use of this document for other than research or instructional purposes, contact the authors. Other information concerning this technical report series can be obtained from the Computer Science and Engineering Department at the University of California at San Diego, techreports@cs.ucsd.edu.


[ Search ]


NCSTRL
This server operates at UCSD Computer Science and Engineering.
Send email to webmaster@cs.ucsd.edu