Fixes are available
Refresh Pack 5.2.2 (June 2014) for Tivoli Storage Productivity Center
Fix Pack 5.1.1.5 (July 2014) for Tivoli Storage Productivity Center
Refresh Pack 5.2.3 (August 2014) for Tivoli Storage Productivity Center
Fix Pack 5.2.4 (November 2014) for Tivoli Storage Productivity Center
Fix Pack 5.2.4.1 (December 2014) for Tivoli Storage Productivity Center
Refresh Pack 5.2.5 (March 2015) for Tivoli Storage Productivity Center (withdrawn)
Fix Pack 5.1.1.6 (March 2015) for Tivoli Storage Productivity Center
Fix Pack 5.2.5.1 (April 2015) for Tivoli Storage Productivity Center (withdrawn)
Refresh Pack 5.2.6 (June 2015) for Tivoli Storage Productivity Center
Refresh Pack 5.2.7 (August 2015) for Tivoli Storage Productivity Center
Fix Pack 5.1.1.9 (October 2015) for Tivoli Storage Productivity Center
IBM Spectrum Control V5.2.8 (December 2015)
IBM Spectrum Control V5.2.9 (February 2016)
IBM Spectrum Control V5.2.10 (May 2016)
IBM Spectrum Control V5.2.10.1 (July 2016)
IBM Spectrum Control V5.2.11 (August 2016)
Fix Pack 5.1.1.12 (October 2016) for Tivoli Storage Productivity Center
Fix Pack 5.1.1.13 (February 2017) for Tivoli Storage Productivity Center
Fix Pack 5.1.1.14 (June 2017) for Tivoli Storage Productivity Center
Fix Pack 5.1.1.15 (Sept 2017) for Tivoli Storage Productivity Center
IBM Spectrum Control V5.2.12 (November 2016)
IBM Spectrum Control V5.2.13 (March 2017)
IBM Spectrum Control V5.2.14 (May 2017)
IBM Spectrum Control V5.2.15 (August 2017)
IBM Spectrum Control V5.2.15.2 (November 2017)
IBM Spectrum Control V5.2.16 (March 2018)
IBM Spectrum Control V5.2.17 (May 2018)
Fix Pack 5.1.1.8 (July 2015) for Tivoli Storage Productivity Center
Fix Pack 5.2.7.1 (February 2016) for Tivoli Storage Productivity Center
APAR status
Closed as fixed if next.
Error description
1. While the TPC-R server is stopped or has connectivity issues, if there is an error in a pprc volume the affected volume is marked with SUSPEND status and non recoverable when the TPC-R server is started or reconnected. 2. If there is another error in a pprc volume (freeze trigger) TPC-R detects it and performs the FREEZE as expected, but the TPC-R will mark the previous volume as recoverable. The pair will be marked with an error IWNR2052E, showing that there may be a problem. This is due to the fact that after the Freeze operation TPC-R will query the status of the secondaries and do a best guess to determine consistency. In this case the secondary, in fact, may or may not be consistent because TPC-R cannot query whether the long busy is still occurring or more importantly whether data has been written since the thaw. At this point the pair is marked as recoverable if the hardware returns that the state is Full Duplex. The APAR will mark the pair as non-recoverable if TPC-R cannot determine consistency instead of marking the pair as recoverable. Circumvention: Use Hardened Freeze option with TPC-R to ensure consistency and correct status while TPC-R server has become inactive or disconnected.
Local fix
Problem summary
**************************************************************** * USERS AFFECTED: * * Customers managing DR could experience this if the * * Replication server is brought down and problems occur on the * * copy relationships during this period. * **************************************************************** * PROBLEM DESCRIPTION: * * | fix pack | 5.1.1-TIV-TPC-FP0005 - target 2Q 2014 | * * | release | 5.2.1-TIV-TPC-FP0000 - target 1Q 2014 | * * * * http://www-01.ibm.com/support/docview.wss?&uid=swg21320822 * * * * The target dates for future fix packs do not represent a * * formal * * commitment by IBM. The dates are subject to change without * * notice. * * * * * * * * 1. Suppose that TPC-R is stopped or has connectivity issues, * * in this case if we have an error in a pprc volume the * * affected volume becomes * * in SUSPEND status and non recoverable when the TPC-R * * becomes active again. * * * * * * 2. If now there is another error in a pprc volume (freeze * * trigger) the TPC-R detects it and performs the FREEZE as * * expected, but the TPC-R * * will mark the previous volume as recoverable. * * The volume should stay in a 'non recoverable' status as * * it was prior to the FREEZE. * * * * * * This could confuse the customer and decide to move to the * * secondary copy while it is not in a consistent status. * * * * * * * * To a degree TPC-R is working as expected, however the * * product should error on the side of caution if tpc-r is shut * * down during the time of error: * * * * TPC-R is behaving as expected. Because TPC_R was shut down * * during the suspend event there is no way that we can * * guarantee consistency. When TPC-R comes back up it does * * detect that the suspend event and react the best as * * possible: * * On startup the session knows about the suspended volume * * Session Name = Session_NAME * * Session State = Prepared * * State Description = No Description Provided * * Session Status = Severe * * Description = * * Is Recoverable? = false * * Is Shadowing? = true * * Total # copysets = 786 * * Were Errors Found = true * * Production Host = H1 * * Production Host with Mode = H1 * * Copy Rules: * * Name = Metro Mirror Failover/Failback * * Type = MM * * Number of Volumes for copytype = 2 * * HWTypes in session = ESS | * * Sites: * * Site 1: * * Sant Cugat * * Site 2: * * Cerdanyola CD1 * * Status Messages on session Session_NAME: * * Sequences in session Session_NAME: * * Sequence Name = H1-H2 * * isRecoverable = false * * isShadowing = true * * # Exceptions = 0 * * # Shadowing = 785 * * # Recoverable = 785 * * # in HW CG = 0 * * direction = true * * Timestamp = n/a * * Progress = CopyProgress: total=132224400, copied=132224399, * * progress=99, timeEstimate=null * * # of Pairs = 786 * * Base Copy Type = MM * * Pair State Counts: * * State Name: Suspended # volumes in this state: 1 *<--- note * * volume is suspended* * * State Name: Prepared # volumes in this state: 785 * * * * The pair that was tampered with has return code 8 and the * * others 10 suspended to maintain consistency * * 2013-11-21 15:20:08.837+0100 CSMSEC-2F RepMgr D * * com.ibm.csm.server.hw.ElementCatalogEventNotifier$EventManag * * erEventNotifier sendBulkEvent TRACE: (HWL-EVENT) Sending 131 * * normal events: * * ET=1:PAIR:RC=2:DS8000:2107.CT171:VOL:0001:DS8000:2107.KH321: * * VOL:0001:SUSPENDED (8) :numberOutOfSync=0::TS=1385043608821 * * ET=1:PAIR:RC=2:DS8000:2107.CT171:VOL:0002:DS8000:2107.KH321: * * VOL:0002:SUSPENDED (10) :numberOutOfSync=0::TS=1385043608821 * * * * TPC-R then thaws the sequence: * * 2013-11-21 15:20:09.196+0100 CSMSEC-30 RepMgr I * * com.ibm.csm.server.session.SessionMgr === ZOS_MM_HOST_SITE0 * * === runOperation for session ZOS_MM_HOST_SITE0 KEY EVENT: * * ******************** Running CmdAction thaw to sequence * * H1-H2 * * * * TPC-R then marks the session as suspended by a freeze * * operation: * * 2013-11-21 15:20:12.353+0100 CSMSEC-30 KeyEventLog D * * StateMgr changeSessionState KEY EVENT: SESSION: * * ZOS_MM_HOST_SITE0 -- STATE: Suspended -- H1 * * *** STATE DESCRIPTION: No Description Provided * * *** CAUSED BY OPERATION OR EVENT: FREEZE_OP * * * * There is a separate thread that then gets kicked off that * * checks for consistency on the target. Unfortunately, since * * there isn't a query on the hardware for whether the pair is * * consistent or not. TPC-R can't query if the long busy is * * still occurring...or really more importantly if the volume * * has actually been updated since it's thaw. Therefore TPC-R * * checks the state of the target volume which comes back as * * duplex and according to the algorithm that means its * * consistent, thus mark the volume consistent. * * 2013-11-21 15:20:12.400+0100 CSMSRV-1CC RepMgr D > * * com.ibm.csm.server.session.policy.actions.CheckMMConsistency * * $CheckThread CheckThread() Entry, parm 1 = check thread * * created for driver * * ESS:2107.CT171:37920:0::ESS:2107.KH321:5408:0 * * 2013-11-21 15:20:18.790+0100 CSMSRV-1CC RepMgr D > StateMgr * * setRecoverable Entry, parm 1 = boolean: true * * * * HOWEVER, that pair is still marked with an error, that will * * be surfaced to the user that something is wrong: * * source: DS8000:2107.CT171:VOL:0001, target: * * DS8000:2107.KH321:VOL:0001, db state: Suspended, db pend * * state: Suspended, hw event state: SUSPENDED, hw reason code: * * 8, hw source state: 2, hw target state: 2 * * Suspended | SOURCE ID: DS8000:2107.CT171:VOL:0001 | SOURCE * * NICKNAME: S0F007 | TARGET ID: DS8000:2107.KH321:VOL:0001 | * * TARGET NICKNAME: S0F007 | RECOVERABLE: true | SHADOWING: * * false | LAST RESULT MSG ID: IWNR2052E * * * * If the customer is running on zOS and wants to be able to * * manage situations where TPC-R is down they will need to look * * into using the Hardened freeze option -> * * http://www.redbooks.ibm.com/redpieces/pdfs/sg247563.pdf * * Enable Hardened Freeze * * * * Use this option to let z/OS Input/Output Supervisor (IOS) * * manage freeze operations for * * * * the volumes in the session, which prevents Tivoli Storage * * Productivity Center for * * * * Replication from freezing the volumes and possibly freezing * * itself. We recommend you to * * * * use this option if you put system volumes, like SYSRES and * * page data sets, into the Copy * * * * Sets of a Metro Mirror session. * * * * Customers will need the following pre-requisites to * * implement this function: * * * * ? z/OS at 1.13 level, with APAR OA37632 installed; * * * * ? z/OS address spaces Basic HyperSwap Management (HSIB) and * * Basic HyperSwap * * * * API (HSIBAPI) must be active, even if you are not going to * * exploit BHS * * * * Hardened Freeze puts the configuration in a paged (freeze * * safe) area within IOS to ensure consistency. * **************************************************************** * RECOMMENDATION: * * Always use a High Availability standby server and issue a * * takeover in the event that the active server requires a * * shutdown for maintenance or outage. * ****************************************************************
Problem conclusion
Temporary fix
Comments
APAR Information
APAR number
IC98387
Reported component name
TPC
Reported component ID
5608TPC00
Reported release
511
Status
CLOSED FIN
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt
Submitted date
2013-12-18
Closed date
2014-02-17
Last modified date
2014-07-29
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Applicable component levels
R511 PSY
UP
R520 PSY
UP
Document Information
Modified date:
23 March 2022