The Object Cache allows users to retrieve FTP, Gopher, and HTTP data quickly and efficiently, often avoiding the need to cross the Internet. The Harvest cache is more than an order of magnitude faster than the CERN cache and other popular Internet caches, because it never forks for WWW and Gopher access, is implemented with non-blocking I/O, keeps meta data and especially hot objects cached in RAM, caches DNS lookups, supports non-blocking DNS lookups, and implements negative caching both of objects and of DNS lookups. A technical paper is available that discusses the Harvest cache's design, implementation, and performance [7].
The Cache can be run in two different modes: as a proxy object cache, or as an httpd accelerator. In this section we discuss the use as a proxy cache; we discuss the httpd accelerator in Section 6.3.
The Cache consists of a main server program cached, a Domain Naming System lookup caching server program dnsserver, a Perl program for retrieving FTP data, and some optional management and client tools. The FTP program arose because of FTP complexities---while we retrieve Gopher and HTTP data from across the Internet using C code built in to cached, we retrieve remote FTP data using an external program ( ftpget), which uses three Perl library files (discussed below). Once the FTP data have been loaded into the local cached copies, subsequent accesses are performed without running these external programs.
When cached starts up, it spawns three dnsserver processes, each of which can perform a single, blocking Domain Naming System (DNS) lookup. This reduces the amount of time the cache waits for DNS lookups. The number of dnsserver processes to use can be changed in the cached.conf file. Future versions may implement non-blocking DNS queries inside the cached process and eliminate the need for dnsserver.
Another big change with version 1.3 is that objects cached to disk are persistent. Upon restart cached now spends some time reloading metadata about on-disk objects. Currently, the cache does not make use of the HTTP ``If-Modified-Since GET'' feature. Cached objects are removed when they expire. A future release will support the conditional GET.
By default, the cache sends an electronic mail message to cache_tracker@cs.colorado.edu, to help us keep track of where caches are running in the Internet. The message lists only the host name, IP address, and port number. You can disable this message by changing the mail_trace configuration variable in the cached.conf file.