NCSTRL Documentation
Maintaining your Dienst server
The tasks described here are:
In the following instructions, all directory names,
except when noted otherwise, are implicitly relative to the dienst
directory that you created when you downloaded the Dienst server.
This section describes the manual steps required to add a new document
to your database.
The Dienst software release also includes a library management package, which
automates parts of these tasks. We recommend that you use it.
The steps are:
- Choose a document identification number (a complete handle)
for the document.
- Create a directory for the document.
- Create a bibliography file for the document, and put it
into the directory.
- Copy any existing formats of the document into the directory. (
Adding new formats of a document is described
in detail in the installation documents.) (Optional)
- Create additional formats for the document. (Optional)
- Add the document to the Index Server's database.
- Register the handle for the new document. (Check with help@ncstrl.org if you need help with this step.)
More details
1.Choosing a document identification number
When you first installed Dienst, you created an initial bibliographic
database.
Each document got a unique identification number. Likewise, when adding a new document you
must assign it a new document identifier. Dienst can not choose this
identifier for you; only you can do this.
2.Creating a directory for the document
The LibMgt/install_tr tool (see step 3. below) can be customized to your sites specific needs to install a new document in your repository.
3.Creating a bibliography file and a directory
Each document in the Dienst database must have an
RFC-1807 formatted , ASCII, tagged bibliographic file. This file must in the proper directory in your repository document database.
The Submit tool helps in this process. It provides a form
for a document author to fill out, and generates a partial
bibliography file in a temporary working directory. Then you can use
the LibMgt/install_tr tool, which finishes the bibliography
file,
creates an appropriate directory, and copies the bib file into the directory.
4.Copy any existing formats.
Copy (or make links to) any available formats of the document into the
directory (in the correct location for that format). The
LibMgt/install_tr tool copies PostScript automatically, but you
must copy any other formats.
5.Create additional formats
Create additional formats
(e.g. thumbnail images) with the LibMgt/db_build tool. A typical use
(assuming the new docid is mypub:tr-259 is db_build
-docid mypub:tr-159. LIbMgt/db_build will create all
possible additional formats, unless you specifically ask it not to.
See the online documentation for more information.
6.Updating the inverted indexes
To add the new document(s) to your server's inverted indexes
run:
Indexer/build-inverted-indexes.pl <docid1> <docid2> ...
where the arguments <docid1> <docid2> ... are the
document ids of the document(s) you are adding. There is no limit to
the number of docids supplied as arguments. However, for a large
number of docids you may find it easier to create a file where each
line is a docid to be added and run
build-inverted-indexes.pl -f <docfile>
where <docfile> is the name of the file.
Messages from build-inverted-indexes.pl are appended to the
logfile
indexer.log in the logs directory. You
might want to examine this log to view any error messages from the
update of your database.
Check the log for errors. If there are none, the final step is
to force the Dienst server to reload the inverted indexes with the command
reload-data
in the directory Utilities/bin,
7. Register the new handle for this document
(Contact help@ncstrl.org if you need help with this step.)
Dienst maintains a log directory that contains the activity logs for each day.
The daily log name is formated: logs/dienst.log.YYYY-MM-DD.
All entries
in this file are prefixed by a date/time stamp in the format dd
mm yy hh:mm:ss. Messages in the log file fall into the following
categories:
- Errors - these indicate a unexpected fatal error in your server
such as a bug in the code or a database or configuration problem. All
of these messages have the string
ERROR:: following the
date/time stamp.
- Warnings - these indicate a request or client oriented
problem. Here are some possible warnings.
- "Unsupported dienst request" which means that the server received an invalid protocol request; this may indicate a user trying to find
a security loophole in your server.
- "Possible malformed tag..." in a bibliography file indicating
that an RFC-1807 bibliography file had possibly invalid format.
- "Server contacted via telnet" which means that a telnet connection was made to
the server port; this may indicate a user trying to find
a security loophole in your server.
All of these messages have the string WARNING:: following a
date/time stamp.
- Administrative messages - these are status messages indicating
when your server was started, reloaded, etc. These messages have the string
ADMIN:: preceded by a date/time stamp.
- Network messages - these are status messages indicating the status of Network
usage. These messages have the string
NETWORK:: preceded by a date/time stamp.
- Transaction messages - these are messages stating any Dienst
protocol request that is received. These messages have the
string of
TRANSACTION:: preceded by a date/time stamp, and
followed by the protocol request.
- Statistics messages - these are messages giving information about Search
requests that are made from your server. These messages have the
string
STATISTICS:: preceded by a date/time stamp, and
followed by detailed information about the results of the Search requests.
We recommend that you monitor your log file for errors and/or
warnings. You can do this manually, but a more preferable method is
to use an automated monitoring tool such as swatch.
The Dienst protocol also offers a way to display the logs by date and by class.
Dienst is shipped with an html page, Dienst/htdocs/dienst_runtime/logs.html, for system administrators. This page is a form that allows the the optional variables to be input, and then generates the Dienst protocol to see a particular log.
Dienst (as of version 4-1-2) is shipped with a utility to parse each daily log, and create
summary file. This summary file is named logs/summary.YYYY-MM-DD.
Dienst protocol has also been enhanced to generate several reports,
based on these summary files.
To create the summary file, run Utility/bin/build-log-summary.
(Documentation is at the beginning of the source code.) Please note that this is not an automated process. We suggest you create a cron job (refer to a Unix manual for more specifics) to do this.
The reports that can be generated are:
- Daily Log Summary - log entry class counted by day.
- Daily Transaction/Search Summary-counts of transactions and searches by day.
- Transaction Total Summary - Daily counts of transactions by Dienst protocol Service/Verb.
- Server Statistics - document hits and search time by day for each Dienst server.
- Top 40 Documents - list of most requested documents.
- Top 100 Originating Sites - top 100 originating sites during period, listed by number of transactions; with Period Summary.
- Top 40 Search Criteria - listing of search criteria most used.
The Dienst protocol also offers a way to create these reports.
Dienst (as of version 4-1-2) is shipped with an html page, Dienst/htdocs/dienst_runtime/logs.html, for system administrators. This page is a form that allows the the optional variables to be input, and then generates, and sends the Dienst protocol to display a specific summary report.
The utility db_check (in LibMgt) helps you
maintain your database. With it, you can check the size and
validity of your documents. The utility is extensible and new
routines can be added to support new file formats.
The database checker check the size of your files, or the validity of
files. The size check displays one table containing a tally of file
formats and their sizes. The verify check produces a verbose error
report, and takes a long time to run. You can check a single
document, all documents of a given year, all documents by a given
publisher, or the entire collection.
Here's an example, the listing of the file system on Cornell's
machine in late August, 1995.
format Nfiles size in bytes
------------------- ------- --------------
bib 1528 1631260
composite 3226 168104299
composite_imagemap 3235 5397500
inline 6169 1997963562
inline_imagemap 3497 16960525
ocr 1210 78759442
paragraph_text 1283 97653226
postscript 145 162808112
scanned 5780 3556458876
structure 1189 1639090
xdoc 5637 308916313
total 6396292205
For more information see the db_check reference page.
Up to Main Information Menu
NCSTRL Documentation
Any comments or questions?
Contact us at help@ncstrl.org.