Experience suggests that fully automated schema matching is infeasible, especially for n-to-m matches involving semantic functions. It is therefore advisable for a matching algorithm not only to do as much as possible automatically, but also to accurately identify the critical points where user input is maximally useful. Our matching algorithm combines several existing approaches with a new emphasis on using the context provided by the way elements are embedded in paths. A prototype tested on biological data (gene sequence, DNA, RNA, etc.)\ and on bibliographic data shows significant performance improvements from utilizing user feedback and context checks. In non-interactive mode on the purchase order schemas used in the COMA project, it compares favorably, and also correctly identifies critical points for user input.
The authors of these documents have submitted their reports to this technical report series for the purpose of non-commercial dissemination of scientific work. The reports are copyrighted by the authors, and their existence in electronic format does not imply that the authors have relinquished any rights. You may copy a report for scholarly, non-commercial purposes, such as research or instruction, provided that you agree to respect the author's copyright. For information concerning the use of this document for other than research or instructional purposes, contact the authors. Other information concerning this technical report series can be obtained from the Computer Science and Engineering Department at the University of California at San Diego, firstname.lastname@example.org.
[ Search ]