Filter files use the standard UNIX regular expression syntax (as defined by the POSIX standard), not the csh ``globbing'' syntax. For example, you would use ``.*abc'' to indicate any string ending with ``abc'', not ``*abc''. A filter file has the following syntax:
Deny regex Allow regex
The URL-Filter regular expressions are matched only on the
URL-path portion of each URL (the scheme, hostname and port are
excluded). For example, the following URL-Filter file would allow all
URLs except those containing the regular expression
``/gatherers/
'':
Deny /gatherers/ Allow .
Another common use of URL-filters is to prevent the Gatherer from travelling ``up'' a directory. Automatically generated HTML pages for HTTP and FTP directories often contain a link for the parent directory `` ..''. To keep the gatherer below a specific directory, use a URL-filter file such as:
Allow ^/my/cool/sutff/ Deny .
Host-Filter regular expressions are matched on the
``hostname:port'' portion of each URL. Because the port
is included, you cannot use ``$
'' to anchor the
end of a hostname. Beginning with version 1.3, IP addresses
may be specified in place of hostnames. A class B address
such as 128.138.0.0 would be written as ``^128\.138\..*
''
in regular expression syntax. For example:
Deny bcn.boulder.co.us:8080 Deny bvsd.k12.co.us Allow ^128\.138\..* Deny .
The order of the Allow
and Deny entries is important, since the filters are applied
sequentially from first to last. So, for example, if you list
``Allow .*
'' first no subsequent Deny expressions will be
used, since this Allow filter will allow all entries.