The Broker retrieves indexing information from Gatherers or other Brokers
through its Collector interface. A list of collection points is
specified in the admin/Collection.conf configuration file. This file
contains a collection point on each line, with 4 fields. The first field is
the host of the remote Gatherer or Broker, the second field is the port number
on that host, the third field is the collection type, and the forth field is
the query filter or --
if there is no filter.
The Broker supports various types of collections as described below:
Type Remote Process Description Compression? -------------------------------------------------------- 0 Gatherer Full collection each time No 1 Gatherer Incremental collections No 2 Gatherer Full collection each time Yes 3 Gatherer Incremental collections Yes 4 Broker Full collection each time No 5 Broker Incremental collections No 6 Broker Collection based on a query No 7 Broker Incremental based on a query No
The query filter specification for collection types 6 and 7 contains two
parts: the --QUERY keywords
portion and an optional --FLAGS
flags
portion. The --QUERY
portion is passed on to the Broker as the
keywords for the query (the keywords can be any Boolean and/or structured
query); the --FLAGS
portion is passed on to the Broker as the
indexer-specific flags to the query. The following table shows the
valid indexer-specific flags for the supported indexers:
Indexer Flag Description ----------------------------------------------------------------------------- All: #desc Show Description Lines Glimpse: #index case insensitive Case Insensitive #index case sensitive Case sensitive #index error number Allow "number" errors #index matchword Matches on word boundaries #index maxresult number Allow max of "number" results #opaque Show matched lines Wais: #index maxresult number Allow max of "number" results #opaque Show scores and rankings
The following is an example Collection.conf, which collects information from 2 Gatherers (one compressed incrementals and the other uncompressed full transfers), and collects information from 3 Brokers (one incrementally based on a timestamp, and the others using query filters):
gatherer-host1.foo.com 8500 3 -- gatherer-host2.foo.com 8500 0 -- broker-host1.foo.com 8501 5 -- broker-host2.foo.com 8501 6 --QUERY (URL : document) AND gnu broker-host3.foo.com 8501 7 --QUERY Harvest --FLAGS #index case sensitive