IBM Support

It may take long time for updateable secondary (HDR or RSS) to become available for SQL execution

Question & Answer


Question

During restart an Informix updateable secondary server (either HDR or RSS) may spend a significant amount of time in the 'fast recovery' mode before it becomes available for SQL execution. In this scenario the data replication subsystem generates following message in the message log: 'Started processing open transactions on secondary during startup'. Question is why does it happen?

Answer

The locks on secondary server are created by recovery threads while replaying logical log records. A secondary server does not able to create the locks when it restarted and at the same time there are open transactions being processed on the primary server, unless it replayed the log records prior to the initial checkpoint.

The updateable secondary server must be kept in the recovery mode until all such prior open transactions are completed in order to maintain the ‘committed read last committed’ isolation level.

You can find open transactions in 'onstat -x' output, which prevent secondary server from going out of 'fast recovery' mode. Following is an example of 'onstat -x' output:



Transactions                                                                                        
address          flags userthread       locks  begin_logpos      current logpos    isol    rb_time  retrys coord
700000030591028  A---- 70000003054f028  0      -                 -                 COMMIT  -        0      
700000030591358  A---- 70000003054f850  0      -                 -                 COMMIT  -        0      
700000030591688  A---- 700000030550078  0      -                 -                 COMMIT  -        0      
<...>
700000030595978  A---- 70000003055abc0  0      -                 -                 DIRTY   -        0      
700000030595ca8  A-B-- 70000003055bc10  0      124:0x0           124:0x2ef018      DIRTY   0:00     0      


In the above output the 'begin_logpos' value for the highlighted transaction is '124:0x0' (where 124 is the logical log number, and 0x0 represents offset into the logical log). The simpler reason of offset '0x0' is secondary server does not know the transaction begin address since it restarted. Therefore, the secondary server will remain in 'fast recovery' mode (however the data replication is continue working) until that transaction is committed or rolled back on the primary server.

On the primary server you can find the open transaction by running onstat -x and grep for the current logpos shown in the secondary:

onstat -x | grep 124:0x2ef018

On the primary you can find the user and session id by running onstat -u and grep for associated userthread obtained from previous onstat -x output:

onstat -u | grep userthread

After some investigation of the session ( onstat -g ses sid ) you might determine it is OK to kill this session so the secondary can complete fast recovery:

onmode -z sessionid

[{"Product":{"code":"SSGU8G","label":"Informix Servers"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Component":"Not Applicable","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF010","label":"HP-UX"},{"code":"PF016","label":"Linux"},{"code":"PF022","label":"OS X"},{"code":"PF027","label":"Solaris"},{"code":"PF033","label":"Windows"}],"Version":"11.5;11.7;12.1","Edition":"Enterprise;Growth;Ultimate;Workgroup","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
03 June 2021

UID

swg21626164