build-inverted-index.pl.
The indexes are stored on disk in the directory
Indexer/Indexes. At server startup, or on receipt of a
USR2 signal, these indexes are read into memory into
associate arrays. All database searches use these associative arrays.
The system assumes that bibliographic data is available in
RFC 1807-formatted ASCII bibliography files. However, the only
RFC 1807 specific code in the server is in the file
Indexer/parse_bib_file.pl. It would not be difficult to
use a different bibliography format, as long as the supported search
fields (described below) are present in that format.
All communication between the server and the search engine flows
through the subroutines in the file
Indexer/indexer_interface.pl. The components of the
interface are described below. Using a new search engine would
require replacing the calls within these subroutines to the Dienst
database engine with semantically equivalent calls to the new search
engine.
Tagged_Search
publisher - a string that specifies the publisher of
matching documents. A match is successful if the input value matches a
leading sub-string of the documents publisher (e.g. "CO" matches
"COLUMBIA" and "CORNELL").
number - a string that specifies the document name of
matching documents. A match is successful if the input value matches
any sub-string of the document's name (e.g. "94" matches "94-1418"
and "68-194").
author - a string that specifies authors' first or
last name or names of a matching document (see the rules
for bibliographic keyword matching below).
title - a string that specifies words in the title of a matching document (see the rules
for bibliographic keyword matching below).
abstract - a string that specifies words in the abstract of
a matching document (see the rules
for bibliographic keyword matching below).
any - a string that specifies words each of the
bibliographic keyword fields (i.e. title,
abstract, and author). Thus
any=foo is equivalent to author=foo, title=foo, abstract=foo.
author, title, abstract) are matched
to bibliographic entries according to the following rules:
abstract field,
will return documents that have the word "robotics" or
"vision" in their abstracts. "robotics and vision" in the
abstract field,
will return documents that have both the word "robotics" and
"vision" in their abstracts. Multiple words that are not separated by
"and" are assumed to be "and" separated. For example,
"computer vision" in the abstract field, will return
documents that have both the words "computer" and "vision" in their
abstracts. Finally, parentheses may be used to group words. For
example, "Gries or (Teitelbaum and Field)" in the
author field, will return documents authored by
"Gries" or by "Teitelbaum" and "Field".
AND).
For example,
oring "robot" in the Title field and
"robotics" in the abstract field will return
documents that have either "robot" in their titles or "robotics" in
their abstracts. anding these fields will
return only those documents that have "robot" in their titles and "robotics" in
their abstracts.
list - a reference to an array that will be filled with a
sorted list of the docids that match the search criteria.
terms - an associative array where keys are the
fields to be searched and values are the search criteria for the
respective fields (as described above).
and - a boolean which is true if the criteria in
terms should be "and"ed together and false if the
criteria should be "or"ed together.
Get_Bib_Data
fields - a reference to an associative array whose
keys correspond to the bibliographic fields to be returned. The
supported bibliographic fields are:
TITLE - the title of the document.
AUTHOR - the author(s) of the document (multiple
authors are separated by a colon (:).
ABSTRACT - the abstract of the document.
DATE - the entry data of the bibliography record.
NOTES - any descriptive notes about the document.
Get_All_IDs
list - a reference to an array that will be filed with a sorted list of the docids.
Get_All_Authors
list - a reference to an array that will be filed
with the list of author names.
Get_Indexer_Status
string - a reference to string that will be filled
in with the HTML status document.
NCSTRL Documentation