The $HARVEST_HOME/lib/gatherer directory contains the default summarizers described in Section 4.5.1, plus various utility programs needed by the summarizers and the Gatherer, as follows:
*.sum
*.unnest
*2soif
cksoif
cksoif < INPUT.soif
gatherd
. cleandb
ensures that all SOIF objects are valid,
and deletes any that are not;
consoldb
will consolidate n GDBM database files into a single GDBM
database file; expiredb
deletes any SOIF objects that are no longer
valid as defined by its Time-To-Live attribute; folddb
runs all
of the operations needed to prepare the Gatherer's database for export by
gatherd
; mergedb
consolidates GDBM files as described in
Section 4.7.7; mkcompressed
generates the compressed
cache All-Templates.gz file; mkgathererstats.pl
generates the
INFO.soif statistics file;
mkindex
generates the cache of timestamps; and
rmbinary
removes binary data from a GDBM database.
dbcheck
checks a URL to see if it has changed since the last time
it was gathered;
enum
peforms a RootNode enumeration on the given URLs;
fileenum
peforms a RootNode enumeration on ``file'' URLs;
ftpenum
calls
ftpenum.pl
to peform a RootNode enumeration on ``ftp'' URLs;
gopherenum
peforms a RootNode enumeration on ``gopher'' URLs;
httpenum
peforms a RootNode enumeration on ``http'' URLs;
newsenum
peforms a RootNode enumeration on ``news'' URLs;
prepurls
is a wrapper program used to pipe Gatherer
and essence together;
staturl
retrieves LeafNode URLs so that dbcheck
can determine if the URL has been modified or not.
All of these programs are internal to Gatherer.
essence
essence [options] -f input-URLs
or essence [options] URL ...
--dbdir directory Directory to place database --full-text Use entire file instead of summarizing --gatherer-host Gatherer-Host value --gatherer-name Gatherer-Name value --gatherer-version Gatherer-Version value --help Print usage information --libdir directory Directory to place configuration files --log logfile Name of the file to log messages to --max-deletions n Number of GDBM deletions before reorganization --minimal-bookkeeping Generates a minimal amount of bookkeeping attrs --no-access Do not read contents of objects --no-keywords Do not automatically generate keywords --allowlist filename File with list of types to allow --stoplist filename File with list of types to remove --tmpdir directory Name of directory to use for temporary files --type-only Only type data; do not summarize objects --verbose Verbose output --version Version information
extractdb, print-attr
print-attr
uses stdin rather than GDBM-file.
extractdb GDBM-file Attribute
gatherd, in.gatherd
gatherd [-db | -index | -log | -zip | -cf file] [-dir dir] port
in.gatherd [-db | -index | -log | -zip | -cf file] [-dir dir]
gdbmutil
gdbmutil consolidate [-d | -D] master-file file [file ...]
gdbmutil delete file key
gdbmutil dump file
gdbmutil fetch file key
gdbmutil keys file
gdbmutil print [-gatherd] file
gdbmutil reorganize file
gdbmutil restore file
gdbmutil sort file
gdbmutil stats file
gdbmutil store file key < data
mktemplate, print-template
print-template
can be used to ``normalize'' a SOIF stream;
it reads a stream of SOIF templates from stdin, parses them, then
writes a SOIF stream to stdout.
mktemplate < INPUT.txt > OUTPUT.soif
quick-sum
template2db
template2db database [tmpl tmpl...]
wrapit
wrapit [Attribute]