IBM Support

WebSphere Application Server synchronization process explained

Troubleshooting


Problem

This is a document to help explain the synchronization process for IBM WebSphere Application Server.

Resolving The Problem

Possible ways to synchronize repositories
Different types of synchronization operations
What is happening during a synchronization


Possible ways to synchronize repositories
  • Auto sync (performed by node agent in the background periodically)
  • User explicitly initiated synchronization (through console and wsadmin, or MBean API, while node agent is running)
  • syncNode.bat/sh (require node agent to be stopped)
  • Synchronization before Application Server startup (if Startup Synchronization flag is checked)
  • Synchronization during node federation (performed by addNode process as one of the tasks during node federation)
  • During restart JMX call on NodeAgent MBean, if the first parameter, syncFirst, is set to true


Different types of synchronization operations
  • Normal (partial) synchronization (only synchronization files dmgr thinks have changed)
  • Full synchronization (synchronize all files in the repositories)

Comparison of types of synchronization operations and possible ways to do synchronization

Auto Sync
User explicitly initiated synchronization?
syncNode.bat/sh
or
synchronization during AddNode
synchronization before Application Server startup
or
restart() JMX call on NodeAgent MBean, if the syncFirst param is set to true
normal (partial) synchronization
Usually are partial synchronizationYes,
if select "Synchronize" in the console, or
if invoke "sync" operation on NodeSync MBean
cannot do partial synchronizationPartial synchronization:
  • If this is neither the first synchronization operation since node agent starts up and nor the first synchronization operation since a "refreshRepositoryEpoch" JMX call is invoked
full synchronization
The very first synchronization after node agent starts is a full synchronization
another possible way for an auto sync to be a full sync is if a "refreshRepositoryEpoch" JMX call to change the epoch value of the repository is issued
Yes,
if select "Full Resynchronize" in the console, or
if invoke "refreshRepositoryEpoch" operation on ConfigRepository MBean in node agent process, before invoking "sync" operation on the Node Sync MBean
always full synchronizationFull synchronization:
  • If this is the very first synchronization operation node agent performs since node agent starts up, or
  • If this is the very first synchronization operation node agent performs since a "refreshRepositoryEpoch" JMX call


What is happening during a synchronization

Before discussing the synchronization operation in detail, review the following:
  • What is this "epoch" thing? How many epoch values are there?
    At a high level, you can think of epochs as a cache mechanism used by node agent and dmgr to determine what files in the master repository have changed since the last synchronization operation. When the next synchronization operation is invoked, only folders in the cache will be compared. If the epochs for a folder in the cache are different, such folders are considered as having been modified and will be checked further later on in the synchronization process.

    The refreshRepositoryEpoch JMX call cleans up the cache, which requires the next synchronization operation to check all folders in the repository, resulting in a full synchronization operation. The result of doing this will also allow dmgr and the node agent to build a new cache, to be used by the subsequent synchronization operations. This is really the only difference between a full synchronization and a partial synchronization: whether or not the cache is empty. The rest of the synchronization operations are the same for partial and full synchronization operations.

    The following will explain why manually changed files will not be pushed to the node repository during a partial synchronization. This is because dmgr is not aware the files have changed and does not put the folders containing such files into the cache. Since partial synchronization only checks the cache, these folders will not be checked. Whereas in a full synchronization, all folders are checked and manual changed files will be detected.

    In a more detailed level, epochs (implemented by com.ibm.websphere.management.repository.ConfigEpoch) are objects containing a long type of variable, when first initialized, has a value of System.currentTimeMillis. Updating an epoch will increase the long variable by 1. Refreshing an epoch will reset this long variable to System.currentTimeMillis again.

    There are two different types of epochs: repository epoch and folder epoch. That is, a repository has a epoch associated with it, or referred to as a repository level epoch; each folder in the repository has a epoch associated with it, or referred to as folder level epochs.

    The purpose of folder level epochs is clear. It is an indication of the state of the files in that folder.

    The purpose of repository level epoch is a bit more complex. Generally speaking, it is an indication of the state of the repository. The repository level epoch, however, can be manipulated. This is the epoch that gets refreshed on the refreshRepositoryEpoch JMX call (actually the epoch for the master repository is refreshed). The folder level epochs, upon a refreshRepositoryEpoch JMX call, are simply removed from the list containing them, forcing the next synchronization operation to rebuild the list (essentially rebuild the cache).

    Note: There are two sets of cell level epoch and folder level epoch. One set on the dmgr repository; another set on the node repository. When epochs are compared, you are comparing the epoch from one repository with that of the other repository. For example, you compare the cell level epoch of master repository with the cell level epoch of node repository; you compare a folder level epoch of master repository with the same folder level epoch of node repository.

  • What are digests and how are digests being calculated?
    Checking digests (checksums) is how synchronization decides if two files are the same or not. If the digests of two files are different, synchronization considers files to be different and will update the node repository. Otherwise, it thinks two files are the same and no file transfer will occur.

    The actual digest calculation is implemented through Java APIs (see com.ibm.ws.management.repository.DocumentDigestImpl, calc() methods and getMessageDigest() method):

    The following is how synchronization calculates the digest:

  • MessageDigest oneMessageDigest = MessageDigest.getInstance("SHA");  <-- digest algorithm is "SHA"
    if (oneMessageDigest != null)
    {
      // Get the digest for the data in the stream
      while ((bytesRead = input.read(buffer)) > 0)
        oneMessageDigest.update(buffer, 0, bytesRead);
        digest = oneMessageDigest.digest();  <-- calculate digest
      }
    }



More details on how synchronization works
Synchronization operations are invoked by node agent in the cases of auto sync or user explicitly initiated synchronization, etc; or by the syncNode/addNode processes when syncNode.bat/sh or addNode.bat/sh is invoked. In all synchronization scenarios, the process initializing the synchronization operation communicates with the dmgr. The dmgr retrieves information about the state of master repository and compare that with the node repository. A list of changed folders is returned back to the node agent, after comparing the epochs of the two repositories. Next, for each folder in the list, dmgr compares the digests of the documents to see if files are indeed different. The changed files are transferred to the node via file transfer and checked into node repository by the node agent.

A synchronization operation can be started if there is not another synchronization operation already going on; otherwise you wait for the on-going synchronization operation to finish before a new one is started.

Detailed steps:
1. Initialize:
  • Reset folder epoch values if necessary.
  • Send out websphere.nodesync.initiated notification.
  • Spawn a new thread to do the actual synchronization work.

2. Synchronize:
    • Synchronization is an iterative process (max number of iteration is determined by SYNC_ITERATION_LIMIT, which is hard coded to 3).
    • If the repository epoch for the dmgr and node agent are the same, then synchronization is done, go to step 3.
    • If the repository epochs do not match, then something has changed and you have some work to do:
      1. First you need to get a list of changed folders in master repository:
        • This is achieved by making the getModifiedFolders() JMX call on CellSync MBean, which exist in dmgr process.
        • The epochs for the folders in node repository are passed to dmgr during this call
        • dmgr compares the epochs from the node repository with those in its own repository to decide which folders have changed
        • dmgr returns a list of modified folders back, as the result of getModifiedFolders() call, along with the type of changes occurred to these folders (deleted, modified, created)

      2. For the list of changed folders:
        1. If the folder is deleted in master repository, delete it from node repository
        2. If the folder is modified/created in master repository, then you need to compare the digests:
          • Get the digests from the local (node) repository.
          • Invoke getFolderSyncUpdates JMX call on CellSync MBean, passing the digests for dmgr to compare with.
          • The dmgr compares the digests to determine which documents have been updated.
          • As the result of invoke getFolderSyncUpdates call, a list of files changed (created, modified, deleted) is returned.
          • Node agent downloads changed documents by way of file transfer.

      3. Go through the list of changed documents and check each document into node repository
        • After 2.c, you go back and check repository epochs. If they match, then you are done. Repository has been synchronized. Otherwise, master repository has changed during this synchronization operation, you need to repeat 2.a to 2.c to synchronize up new changes.
        • Note, you only repeat synchronization for a maximum of 3 times. After that, you stop and output a message:

        • ADMS0023I: A synchronization operation reached the iteration limit

3. Post synchronization operations:
    • Send synchronization completion notification.
    • Depending on which files are updated in the node repository, application management code might be invoked to expand the binary EAR to installedApps.

Note: If the following files are changed, then EAR will be expanded: EAR file, variables.xml (if the application install root variable is updated).

To review tracing examples of the Synchronization process see this technote.

[{"Product":{"code":"SSEQTP","label":"WebSphere Application Server"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Component":"System Management\/Repository","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF010","label":"HP-UX"},{"code":"PF016","label":"Linux"},{"code":"PF027","label":"Solaris"},{"code":"PF033","label":"Windows"}],"Version":"8.5.5;8.5;8.0;7.0;6.1","Edition":"Network Deployment","Line of Business":{"code":"LOB45","label":"Automation"}},{"Product":{"code":"SSNVBF","label":"Runtimes for Java Technology"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":"Java SDK","Platform":[{"code":"","label":""}],"Version":"","Edition":"","Line of Business":{"code":"LOB36","label":"IBM Automation"}}]

Document Information

Modified date:
15 June 2018

UID

swg21233075