build-inverted-index.pl. The indexes are stored on disk in the directory
Indexer/Indexes. At server startup, or on receipt of a
USR2signal, these indexes are read into memory into associate arrays. All database searches use these associative arrays.
The system assumes that bibliographic data is available in
RFC 1807-formatted ASCII bibliography files. However, the only
RFC 1807 specific code in the server is in the file
Indexer/parse_bib_file.pl. It would not be difficult to
use a different bibliography format, as long as the supported search
fields (described below) are present in that format.
All communication between the server and the search engine flows
through the subroutines in the file
Indexer/indexer_interface.pl. The components of the
interface are described below. Using a new search engine would
require replacing the calls within these subroutines to the Dienst
database engine with semantically equivalent calls to the new search
publisher- a string that specifies the publisher of matching documents. A match is successful if the input value matches a leading sub-string of the documents publisher (e.g. "CO" matches "COLUMBIA" and "CORNELL").
number- a string that specifies the document name of matching documents. A match is successful if the input value matches any sub-string of the document's name (e.g. "94" matches "94-1418" and "68-194").
author- a string that specifies authors' first or last name or names of a matching document (see the rules for bibliographic keyword matching below).
title- a string that specifies words in the title of a matching document (see the rules for bibliographic keyword matching below).
abstract- a string that specifies words in the abstract of a matching document (see the rules for bibliographic keyword matching below).
any- a string that specifies words each of the bibliographic keyword fields (i.e.
any=foois equivalent to
author=foo, title=foo, abstract=foo.
abstract) are matched to bibliographic entries according to the following rules:
abstractfield, will return documents that have the word "robotics" or "vision" in their abstracts. "robotics and vision" in the
abstractfield, will return documents that have both the word "robotics" and "vision" in their abstracts. Multiple words that are not separated by "and" are assumed to be "and" separated. For example, "computer vision" in the
abstractfield, will return documents that have both the words "computer" and "vision" in their abstracts. Finally, parentheses may be used to group words. For example, "Gries or (Teitelbaum and Field)" in the
authorfield, will return documents authored by "Gries" or by "Teitelbaum" and "Field".
AND). For example, oring "robot" in the
Titlefield and "robotics" in the
abstractfield will return documents that have either "robot" in their titles or "robotics" in their abstracts. anding these fields will return only those documents that have "robot" in their titles and "robotics" in their abstracts.
list- a reference to an array that will be filled with a sorted list of the docids that match the search criteria.
terms- an associative array where keys are the fields to be searched and values are the search criteria for the respective fields (as described above).
and- a boolean which is true if the criteria in
termsshould be "and"ed together and false if the criteria should be "or"ed together.
fields- a reference to an associative array whose keys correspond to the bibliographic fields to be returned. The supported bibliographic fields are:
TITLE- the title of the document.
AUTHOR- the author(s) of the document (multiple authors are separated by a colon (
ABSTRACT- the abstract of the document.
DATE- the entry data of the bibliography record.
NOTES- any descriptive notes about the document.
list- a reference to an array that will be filed with a sorted list of the docids.
list- a reference to an array that will be filed with the list of author names.
string- a reference to string that will be filled in with the HTML status document.
Up to Main Information Menu