Version 1.3 has an improved debugging facility. Extra information from specific programs and library routines can be logged by setting debugging flags. A debugging flag has the form -D section, level. Section is an integer in the range 1--255, and level is an integer in the range 1--9. Debugging flags can be given on a command line, with the Debug-Options: tag in a gatherer configuration file, or by setting the environment variable $HARVEST_DEBUG. Examples:
Debug-Options: -D68,5 -D44,1 % httpenum -D20,1 -D21,1 -D42,1 http://harvest.cs.colorado.edu/ % setenv HARVEST_DEBUG '-D20,1 -D23,1 -D63,1'
Debugging sections and levels have been assigned to the following sections of the code:
section 20, level 1 Common liburl URL processing section 21, level 1, 5 Common liburl HTTP routines section 22, level 1 Common liburl disk cache routines section 23, level 1 Common liburl FTP routines section 24, level 1 Common liburl Gopher routines section 25, level 1 urlget - standalone liburl program. section 26, level 1 ftpget - standalone liburl program. section 40, level 1, 5, 9 Gatherer URL enumeration section 41, level 1 Gatherer enumeration URL verification section 42, level 1, 5, 9 Gatherer enumeration for HTTP section 43, level 1 Gatherer enumeration for Gopher section 44, level 1, 5 Gatherer enumeration filter routines section 45, level 1 Gatherer enumeration for FTP section 46, level 1 Gatherer enumeration for file:// URLs section 48, level 1, 5 Gatherer enumeration robots.txt stuff section 60, level 1 Gatherer essence data object processing section 61, level 1 Gatherer essence database routines section 62, level 1 Gatherer essence main section 63, level 1 Gatherer essence type recognition section 64, level 1 Gatherer essence object summarizing section 65, level 1 Gatherer essence object unnesting section 66, level 1 Gatherer essence post-summarizing section 69, level 1, 5, 9 Common SOIF template processing section 80, level 1 Common utilities memory management section 81, level 1 Common utilities buffer routines section 82, level 1 Common utilities system(3) routines section 83, level 1 Common utilities pathname routines section 84, level 1 Common utilities hostname processing section 85, level 1 Common utilities string processing section 86, level 1 Common utilities DNS host cache section 102, level 1 Broker Glimpse indexing engine
So for directories, symbolic links, and CGI scripts, the HTTP server is always contacted. We don't perform URL translation for local mappings. If your URL's have funny characters that must be escaped, then the local mapping will also fail. Add debug option -D20,1 to understand how local mappings are taking place.
--full-text
option I see a lot of
raw data in the content
summaries, with few keywords I can search.
--full-text
simply includes the full data content in the SOIF
summaries. Using the individual file type summarizing mechanism described
in Section 4.5.4 will work better in this regard, but
will require you to specify how data are extracted for each individual
file type. In a future version of Harvest we will change the Essence
--full-text
option to perform content extraction before including the
full text of documents.
REGEX_DEFINE = -DUSE_POSIX_REGEX REGEX_INCLUDE = REGEX_OBJ = REGEX_TYPE = posix
To verify that your system is configured for DNS, make sure that the file /etc/resolv.conf exists and is readable. Read the resolv.conf(5) manual page for information on this file. You can verify that DNS is working with the nslookup command.
The Harvest executables for SunOS (4.1.3_U1) are statically linked with the stock resolver library from /usr/lib/libresolv.a. If you seem to have problems with the statically linked executables, please try to compile Harvest from the source code (see Section 3). This will make use of your local libraries, which may have been modified for your particular organization.
Some sites may use Sun Microsystem's Network Information Service (NIS) instead of, or in addition to, DNS. We believe that Harvest works on systems where NIS has been properly configured. The NIS servers (the names of which you can determine from the ypwhich command) must be configured to query DNS servers for hostnames they do not know about. See the -b option of the ypxfr command.
We would welcome reports of Harvest successfully working with NIS. Please email us at harvest-dvl@cs.colorado.edu.
If you see the ``Host is unreachable'' message, these are the likely problems:
If you see the ``Connection refused'' message, the likely problem is that you are trying to connect with an unused port on the destination machine. In other words, there is no program listening for connections on that port.
The Harvest gatherer is essentially a WWW client. You should expect it to work the same as Mosaic, but without proxy support. We would be interested to hear about problems with Harvest and hostnames under the condition that the gatherer is unable to contact a host, yet you are able to use other network programs (Mosaic, telnet, ping) to that host without going through a proxy.