A fix is available
APAR status
Closed as program error.
Error description
********************* * HIPER information * ********************* In very rare occasions, Qcapture might miss some rows or send duplicate rows (sql0803N) during spilling. The missing rows problem can be caused by two threads (publisher and spill) accessing the same trans object simultaneously, one to publish and one to spill. We recommend increasing the memory limit and possibly the region size to avoid spilling if you do not have this APAR.
Local fix
Problem summary
**************************************************************** * USERS AFFECTED: 1- Q/SQL Capture * **************************************************************** * PROBLEM DESCRIPTION: 1- racing of publishing and spilling * * threads cause missing row * **************************************************************** * RECOMMENDATION: * **************************************************************** 1- flag is checked to ensure only committed transactions are spilled
Problem conclusion
1- The problem has been fixed When the log reader thread of the Capture needs to free up memory it will start writing to disk. The term for this behavior is called spilling. A problem can arise when capture spills a captured transaction that is in the process of being published by the publishing (worker) thread. The log reader thread checks that a captured transaction is not being published before spilling it but the check in the code is incorrect. There is a timing hole during which the log reader thread does not detect that the worker thread is about to start publishing the transaction. As a result the capture program internal memory for the transaction container is corrupted. The symptoms of this problem include missing rows and/or duplicate rows at the target. It can affect the spilled transaction as well as subsequent transactions. The defect occurs if all of the following conditions are met: - the Capture program has used almost all of its memory (controlled by MEMORY_LIMIT) to capture transactions - there is one captured committed transactions not yet published by the worker thread in this commit interval, this transaction is the largest transaction in memory, and the worker thread is just about to start publishing it - there is still more than 64KB of memory available before reaching MEMORY_LIMIT (otherwise, the log reader would sleep to give a chance to the worker to free up memory), the log reader thread reads more records from the log causing MEMORY_LIMIT to be exceeded, and starts spilling to disk the largest transaction in memory. - the worker thread starts publishing the transaction that the log reader thread is currently spilling This is a rare condition subject to I/O and processor speeds, if the worker gets to the transaction first, there is no problem. If the log reader thread finishes to spill the transaction before the worker thread publishes it, there is no problem. The fix is that Capture will only spill transactions for which a commit has not yet being read from the log; any completely captured transaction is not spilled, because once it is published and committed to MQ (for Q Replication) or to the DBMS (for SQL Replication) ,the memory will be freed. Capture chooses to spill to disk the largest incomplete transaction in memory. If there is no incomplete transaction to spill, but some transactions were captured, Capture sleeps repeatedly as per COMMIT_INTERVAL, until the published transactions are committed and space is freed. If no transaction was captured, Capture will try to exceed its MEMORY_LIMIT budget, if there is no memory available from the system, Capture will come down with an error. But this fix will prevent any data loss or duplicate data.
Temporary fix
Comments
APAR Information
APAR number
PM04195
Reported component name
WS REPLICATION
Reported component ID
5655L8800
Reported release
910
Status
CLOSED PER
PE
NoPE
HIPER
YesHIPER
Special Attention
NoSpecatt
Submitted date
2009-12-25
Closed date
2010-01-08
Last modified date
2010-02-01
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
UK53447 UK53448 UK53449 UK53450
Modules/Macros
ASNACMD ASNADMSP ASNAPPLY ASNCAP ASNCCMD ASNLOAD ASNMCMD ASNMIG8 ASNMON ASNMONIT ASNPLXFY ASNQACMD ASNQAPP ASNQCAP ASNQCCMD ASNQDEP ASNQMFMT ASNQXFMT ASNRBASE ASNTDIFF ASNTRC ASN2BASE
Fix information
Fixed component name
WS REPLICATION
Fixed component ID
5655L8800
Applicable component levels
R910 PSY UK53447
UP10/01/19 P F001
R911 PSY UK53448
UP10/01/20 P F001
R912 PSY UK53449
UP10/01/19 P F001
R913 PSY UK53450
UP10/01/19 P F001
Fix is available
Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.
[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSDP5R","label":"InfoSphere Replication Server"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"910","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]
Document Information
Modified date:
01 February 2010