Overview of 8.5.2 Improvements in DAOS Catalog
How does the DAOS catalog and resync work in Domino 8.5.2?
DAOS Catalog Overview:
The DAOS catalog is used to keep track of all databases that are participating in DAOS as well as to track all the NLO files in the system. The database contains a DAOS ticket which has a 'hint' to the location on the NLO file on disk. When an object is opened, this hint is used to find the NLO. If for some reason the NLO is not in the specified location, Domino checks the catalog to see where it specifies the NLO should be. Then that path is used to open the NLO. The daoscat.nsf is also used to determine when NLO's are no longer referenced and thus become candidates to be pruned. The prune process runs nightly, and will delete all NLO's with a 0 reference count that have been on the deletion list for longer than the prune interval. Prune will only run when the state of the catalog is SYNCHRONIZED.
Resync Process Overview:
The DAOS resync process updates the DAOS catalog to a known good state. It does this by first populating the catalog with the list of databases participating in DAOS and then populating the list of NLO's. Then it scans all databases on the server, counting their DAOS references and updating the catalog with the relevant information. The following are the state transitions of the DAOS Catalog during resync.
Catalog State Table
|UNAVAILABLE||DAOS Catalog is not available.|
|REBUILDING*||The DAOS Object Index (DOI) is being rebuilt.|
|RESYNCING||Databases are being scanned for DAOS references.|
|SYNCHRONIZED||All DAOS-enabled databases are in a known state.|
|NEEDS_RESYNC||There is at least one DAOS-enabled database in an unknown state.|
Database State Table
|Resync: Pending||Database is waiting to be scanned for DAOS tickets.|
|Resync: Scanning||Database is currently being scanned for DAOS tickets.|
|Resync: Updating Refs||Database's tickets are being processed, DAOS references incremented.|
|Synchronized||Database is in a known state, all DAOS tickets counted in the catalog.|
|Deleted||Database has been deleted and is being scanned for DAOS tickets.|
|Deleted: Updating Refs||Deleted database's tickets are being decremented in DAOS catalog.|
|Adopt: New DB*||A new database has been discovered and is waiting to be scanned.|
UNAVAILABLE - During this state, no new objects can go into DAOS. Existing objects whose hints are correct may be opened. While in this state, resync is rebuilding the DAOS ID Table (DIT), which is the table of all databases on the server that are participating in DAOS. During this stage of the process, the daoscat.nsf must be locked. Once this state is complete, all database will be in the "Resync: Pending" state. This phase of the resync takes a relatively short amount of time to complete.
REBUILDING - During this state the DAOS Object Index (DOI) is being rebuilt. The DOI is the table that contains the list of NLO's on the system and their associated meta data. This includes the relative path and its reference count. During this state, the catalog is available and new objects may be created in DAOS. Also during this state the old DOI is used while a new secondary DOI is built. This allows attachments to be opened even if the hints in the database are incorrect, assuming the old DOI has the NLO listed in it. (Note that this is a new state as of 8.52. Previously, the catalog would be in the UNAVAILABLE state while the DOI was rebuilt. This new state allows for DAOS to be available while the new DOI is being rebuilt.)
RESYNCING - During this state, the resync process is scanning all DAOS-enabled databases, collecting their ticket lists, and applying these ticket lists to the DOI, for example, updating the refcounts for each reference to the NLO contained in the database. The databases transition through the following states as they are scanned and their tickets are processed. This is the most time-consuming phase of the resync process. Also note that the catalog may be left in the RESYNCING state after resync exits. See the resync quick option below.
- Resync: Pending
- Resync: Scanning
- Resync: Updating Refs
SYNCHRONIZED - All databases have been successfully scanned, object refcounts updated, and databases updated to the SYNCHRONIZED state.
NEEDS_RESYNC - The DAOS catalog needs resync run on it. During this state, DAOS will continue to function normally, but Prune will be unable to delete NLO's with 0 refcounts. The database or databases that caused the catalog to transition to NEEDS_RESYNC will be in the Resync: Needed state. Additional information about the root cause is provided in the associated DDM event.
Resync Time Window:
Another new feature added to 8.52 allows an administrator to control when DAOS resync will be allowed to run. While the impact of resync on a running server has been greatly reduced by improvements in 8.5.2, it still may not be desirable for resync to run during production hours. The following two notes.ini parameters control when resync will be allowed to run. Note that these parameters do not launch resync automatically; they just control whether or not resync may run at a given time, and also tell resync to stop at the stop time.
The format of the time is as follows: HH:MM:SS AM/PM
The above example will allow resync to run between midnight and 4:00 AM. A program document could be created to launch resync just after midnight. If the catalog is SYNCHRONIZED, then resync will not run. However, if a resync is needed, then resync will start the process above. When time reaches the end time, the resync threads will finish up the databases they are processing and exit. Because each thread will finish the NSF it is processing, the work may run slightly past the specified stop time. The next night, when resync is run again, it will pick up where it left off.
If resync finished processing all databases during the time window, then the catalog will go to the SYNCHRONIZED state. At this point there is still a little more work to be done to the catalog. Even though past the time window, this processing will be finished up. Therefore, there is a potential for resync to run past the end time, but the remaining work is low-impact and will not effect server response time.
Note that until resync scans all databases on the system and puts them into the SYNCHRONIZED state, the DAOS catalog will remain in the RESYNCING state. This indicates that resync still has work to be done. It does not, howeve,r indicate that resync is currently running.
Other resync options:
force - A resync can be forced to run by passing the force command - for example, 'tell daosmgr resync force' This command will override time windows as well as it will resync the catalog even if the catalog state is SYNCRONIZED. Also a force will cause resync to start over at the beginning, rebuilding the DIT and the DOI no matter what state the catalog was previously in.
quick - A resync quick will just rebuild the DIT and the DOI but will not scan the databases. This is useful if it is determined that a resync is needed but it is during production hours. A resync quick will get DAOS to a operational state, except that there will be no reference counts on the NLO's. That means new NLO files can be created, and all existing ones can be read, but there will be no information on reference counts. The catalog will be left in the RESYNCING state. Because there is no reference count information, the nightly prune process will not be allowed to run. The next time a daos resync starts it will pick up where it left off. So it will start with the scanning of the databases as described above.
The above options can be used together as well. A 'tell daosmgr resync quick force' will allow a quick resync to be run on a SYNCHRONIZED catalog, leaving the catalog in the Resyncing state. A later run or resync without the force option will pick up where it left off.
More support for:
Software version: 8.5.2, 188.8.131.52, 184.108.40.206, 220.127.116.11, 18.104.22.168, 8.5.3, 22.214.171.124, 126.96.36.199, 188.8.131.52, 184.108.40.206, 9.0, 9.0.1
Operating system(s): AIX, IBM i, Linux, Solaris, Windows, z/OS
Reference #: 1448635
Modified date: 01 October 2010