Topologies for linking collectives to implement multi-master replication

You have several different options when choosing the topology for your deployment that incorporates multiple collectives. Multi-master replication topologies can be implemented in the DataPower® XC10 Appliance by creating multiple collectives and linking them.

Links connecting collectives

A replication data grid infrastructure is a connected graph of collectives with bidirectional links among them. With a link, two collectives can communicate data changes. For example, the simplest topology is a pair of collectives with a single link between them. The collectives are named alphabetically: A, B, C, and so on, from the left. A link can cross a wide area network (WAN), spanning large distances. Even if the link is interrupted, you can still change data in either collective. The topology reconciles changes when the link reconnects the collectives. Links automatically try to reconnect if the network connection is interrupted.

Link

After you set up the links, the product first tries to make every collective identical. Then, eXtreme Scale tries to maintain the identical conditions as changes occur in any collective. The goal is for each collective to be an exact mirror of every other collective connected by the links. The replication links between the collectives help ensure that any changes made in one collective are copied to the other collectives.

Line topologies

Although it is such a simple deployment, a line topology demonstrates some qualities of the links. First, it is not necessary for a collective to be connected directly to every other collective to receive changes. The collective B pulls changes from collective A. The collective C receives changes from collective A through collective B, which connects collectives A and C. Similarly, collective D receives changes from the other collectives through collective C. This ability spreads the load of distributing changes away from the source of the changes.

Line topology

Notice that if collective C fails, the following actions would occur:

collective D would be orphaned until collective C was restarted
collective C would synchronize itself with collective B, which is a copy of collective A
collective D would use collective C to synchronize itself with changes on collective A and B. These changes initially occurred while collective D was orphaned (while collective C was down).

Ultimately, collectives A, B, C, and D would all become identical to one other again.

Ring topologies

Ring topologies are an example of a more resilient topology. When a collective or a single link fails, the surviving collectives can still obtain changes. The collectives travel around the ring, away from the failure. Each collective has at most two links to other collectives, no matter how large the ring topology. The latency to propagate changes can be large. Changes from a particular collective might need to travel through several links before all the collectives have the changes. A line topology has the same characteristic.

Ring topology

You can also deploy a more sophisticated ring topology, with a root collective at the center of the ring. The root collective functions as the central point of reconciliation. The other collectives act as remote points of reconciliation for changes occurring in the root collective. The root collective can arbitrate changes among the collectives. If a ring topology contains more than one ring around a root collective, the collective can only arbitrate changes among the innermost ring. However, the results of the arbitration spread throughout the collectives in the other rings.

Hub-and-spoke topologies

With a hub-and-spoke topology, changes travel through a hub collective. Because the hub is the only intermediate collective that is specified, hub-and-spoke topologies have lower latency. The hub collective is connected to every spoke collective through a link. The hub distributes changes among the collectives. The hub acts as a point of reconciliation for collisions. In an environment with a high update rate, the hub might require run on more hardware than the spokes to remain synchronized. WebSphere® DataPower XC10 Appliance is designed to scale linearly, meaning you can make the hub larger, as needed, without difficulty. However, if the hub fails, then changes are not distributed until the hub restarts. Any changes on the spoke collectives will be distributed after the hub is reconnected.

Hub-and-spoke topology

You can also use a strategy with fully replicated clients, a topology variation which uses a pair of servers that are running as a hub. Every client creates a self-contained single container data grid with a catalog in the client JVM. A client uses its data grid to connect to the hub catalog. This connection causes the client to synchronize with the hub as soon as the client obtains a connection to the hub.

Any changes made by the client are local to the client, and are replicated asynchronously to the hub. The hub acts as an arbitration collective, distributing changes to all connected clients. The fully replicated clients topology provides a reliable L2 cache for an object relational mapper, such as OpenJPA. Changes are distributed quickly among client JVMs through the hub. If the cache size can be contained within the available heap space, the topology is a reliable architecture for this style of L2.

Use multiple partitions to scale the hub collective on multiple JVMs, if necessary. Because all of the data still must fit in a single client JVM, multiple partitions increase the capacity of the hub to distribute and arbitrate changes. However, having multiple partitions does not change the capacity of a single collective.

Tree topologies

You can also use an acyclic directed tree. An acyclic tree has no cycles or loops, and a directed setup limits links to existing only between parents and children. This configuration is useful for topologies that have many collectives. In these topologies, it is not practical to have a central hub that is connected to every possible spoke. This type of topology can also be useful when you must add child collectives without updating the root collective.

Tree topology

A tree topology can still have a central point of reconciliation in the root collective. The second level can still function as a remote point of reconciliation for changes occurring in the collective beneath them. The root collective can arbitrate changes between the collectives on the second level only. You can also use N-ary trees, each of which have N children at each level. Each collective connects out to n links.

Fully replicated clients

This topology variation involves a pair of servers that are running as a hub. Every client creates a self-contained single container data grid with a catalog in the client JVM. A client uses its data grid to connect to the hub catalog, causing the client to synchronize with the hub as soon as the client obtains a connection to the hub.

Any changes made by the client are local to the client, and are replicated asynchronously to the hub. The hub acts as an arbitration collective, distributing changes to all connected clients. The fully replicated clients topology provides a good L2 cache for an object relational mapper, such as OpenJPA. Changes are distributed quickly among client JVMs through the hub. As long as the cache size can be contained within the available heap space of the clients, this topology is a good architecture for this style of L2.

Use multiple partitions to scale the hub collective on multiple JVMs, if necessary. Because all of the data still must fit in a single client JVM, using multiple partitions increases the capacity of the hub to distribute and arbitrate changes, but it does not change the capacity of a single collective.