next up previous contents index
Next: 6 The Object Cache Up: 5 The Broker Previous: 5.9 Collector interface description:

5.10 Troubleshooting

   

 
Symptom
The Broker is running but always returns empty query results.

Solution
Look at the log messages in the broker.out file in the Broker's directory for error messages. If your Broker didn't index the data, use the administrative interface to force the Broker to build the index (see Section 5.6).

Symptom
I just upgraded to Glimpse 3.0, and searches fail.

Solution
The pre-3.0 indexes are incompatible with the 3.0 indexes. You need to reindex your data using Glimpse-3.0.

Symptom
When I query my Broker, I get a "500 Server Error".

Solution
Generally, the ``500'' errors are related to a CGI program not working correctly or a misconfigured httpd server. Make sure that the userid running the HTTP server has access to the Harvest cgi-bin directory and the Perl include files in $HARVEST_HOME/lib. Refer to Section 3.5 for further details.

 

Symptom
I see duplicate documents in my Broker.

Solution
The Broker performs duplicate elimination based on a combination of MD5 checksums and Gatherer-Host,Name,Version. Therefore, you can end up with duplicate documents if your Broker collects from more than one Gatherer, each of which gathers from the (a subset of) the same URLs. (As an aside, the reason for this notion of duplicate elimination is to allow a single Broker to contain several different SOIF objects for the same URL, but summarized in different ways.)

Two solutions to the problem are:

  1. Run your Gatherers on the same host.

  2. remove the duplicate URLs in a customized version of the BrokerQuery program by doing a string comparison of the URLs.

 

Symptom
The Broker takes a long time and does not answer queries.

Solution
Some queries are quite expensive, because they involve a great deal of I/O. For this reason we modified the Broker so that if a query takes longer than 4 minutes, the query process is killed. The best solution is to use a less expensive query, for example by using less common keywords.

 

Symptom
Some of the query options (such as structured or case sensitive queries) aren't working.

Solution
This usually means you are using an index/search engine that does not support structured queries (like the current Harvest support for commercial WAIS). One way this happens is if you use a replica site that is running a different engine than you're used to (e.g., the Brokers at town.hall.org use commercial WAIS, while the Brokers at harvest.cs.colorado.edu use Glimpse). If you are setting up your own Broker (rather than using someone else's Broker), see Section 5.8 for details on how to switch to other index/search engines. Or, it could be that your BrokerQuery.cgi program is an old version and should be updated.

 

Symptom
I get syntax errors when I specify queries.

Solution
Usually this means you did not use double quotes where needed. See Section 5.3.

   

Symptom
When I submit a query, I get an answer faster than I can believe it takes to perform the query, and the answer contains garbage data.

Solution
This probably indicates that your httpd is misconfigured. A common case is not putting the 'ScriptAlias' before the 'Alias' in your conf/srm.conf file, when running the NCSA httpd. (The 'ScriptAlias' and 'Alias' setup is described in the INSTRUCTIONS file in the Harvest software distribution.)

 

Symptom
When I make changes to the Broker configuration via the administration interface, they are lost after the Broker is restarted.

Solution
The Broker administration interface does not save changes across sessions. Permanent changes to the Broker configuration should be done through the broker.conf file.

Symptom
My Broker is running very slowly.

Solution
Performance tuning can be complicated, but the most likely problem is that you are running on a machine with insufficient RAM, and paging a lot because the query engine kicks pages out in order to access the needed index and data files. (In UNIX the disk buffer cache competes with program and data pages for memory.)

A simple way to tell is to run ``vmstat 5'' in one window, and after a couple of lines of output issue a query from another window. This will print a line of measurements about the virtual memory status of your machine every 5 seconds. In particular, look at the ``pi'' and ``po'' columns. If the numbers suddenly jump into the 500-1,000 range after you issue the query, you are paging a lot.

Note that paging problems are accentuated by running simultaneous memory-intensive or disk I/O-intensive programs on your machine. For example, we have performance problems on our demonstration machine (harvest.cs.colorado.edu) because we run over a dozen Brokers there. If several users issue queries to different Brokers at once, quite a bit of paging results, and performance degrades noticeably. Simultaneous queries to a single Broker should not cause a paging problem, because the Broker processes the queries sequentially.

It is best to run Brokers on an otherwise mostly unused machine with at least 64 MB of RAM (or more, if the above ``vmstat'' experiment indicates you are paging alot).

One other performance enhancer is to run an httpd-accelerator (see Section 6.3) on your Broker machine, to intercept queries headed for your Broker. While it will not cache the results of queries, it will reduce load on the machine because it provides a very efficient means of returning results in the case of concurrent queries. Without the accelerator the results are sent back by a BrokerQuery.pl UNIX process per query, and inefficiently time sliced by the UNIX kernel. With an accelerator the BrokerQuery.pl processes exit quickly, and let the accelerator send the results back to the concurrent users. The accelerator will also reduce load for (non-query) retrievals of data from your httpd server.

 



next up previous contents index
Next: 6 The Object Cache Up: 5 The Broker Previous: 5.9 Collector interface description:



Duane Wessels
Wed Jan 31 23:46:21 PST 1996