7.1 Overview

Next: 7.2 Basic setup Up: 7 The Replicator Previous: 7 The Replicator

7.1 Overview

The Harvest replicator will distribute copies of a Broker's database to replicas running throughout the Internet. Replication distributes the server load on a Broker, improving query performance and availability, and the replicator attempts to minimize network traffic and server workload when propagating updates.

The replicator manages a single directory tree of files. One site must be designated as the master copy. Updates to the master copy propagate to all other replicas and the master copy eventually overwrites any local changes made at individual replicas. It is possible to configure a replicated collection so that a different master copies manages separate sub-trees, to distribute the responsibility of (gathering and) managing a large collection. Each replicated collection is exported through a (single or possibly hierarchically nested) replication group. When a replica joins a replication group, it begins to fill with data. The right to join a replication group is managed by an access control list. If a replication group grows to hundreds or thousands of members, a new group can be created to ease management. This arrangement is illustrated in Figure 4.

Figure 4: Replicator System Overview

The Harvest replicator consists of four components:

We use the FTP mirror system to check file dates and sizes and to perform the actual file transfers between replicas.
Mirrord generates configuration files that tell FTP mirror where to retrieve data, based on a recent bandwidth and delay estimates between group members.
Floodd periodically performs bandwidth and delay measurements among members of a replication group, for use by mirrord. A floodd group master computes the ``logical update topology'' for the group, which depends on the bandwidth and delay estimates between group members.
Archived distributes updates of the Version file, which is used to determine when to run FTP mirror. This reduces the frequency with which FTP mirror runs, improving performance. The Version file is updated each time the Harvest Gatherer runs.

The replication system design is discussed in more depth in [8].

Next: 7.2 Basic setup Up: 7 The Replicator Previous: 7 The Replicator

Duane Wessels
Wed Jan 31 23:46:21 PST 1996