IBM Support

PM04195: ON RARE OCCASIONS QCAPTURE MAY MISS OR DUPLICATE ROWS DURING SPILLING.

A fix is available

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • *********************
    * HIPER information *
    *********************
    In very rare occasions, Qcapture  might miss some rows  or  send
    duplicate rows (sql0803N) during spilling.   The missing rows
    problem can be caused by two threads (publisher and spill)
    accessing the same trans object simultaneously, one to publish
    and one to spill.
    
    We recommend increasing the memory limit and possibly
    the region size to avoid spilling if you do not have this APAR.
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED: 1- Q/SQL Capture                             *
    ****************************************************************
    * PROBLEM DESCRIPTION: 1- racing of publishing and spilling    *
    *                      threads cause missing row               *
    ****************************************************************
    * RECOMMENDATION:                                              *
    ****************************************************************
    1- flag is checked to ensure only committed transactions are
    spilled
    

Problem conclusion

  • 1- The problem has been fixed
    When the log reader thread of the Capture needs to free up
    memory it will start writing to disk.  The term for this
    behavior is called spilling.  A problem can arise when capture
    spills a captured transaction that is in the process of being
    published by the publishing (worker) thread.  The log reader
    thread checks that a captured transaction is not being published
    before spilling it but the check in the code is incorrect.
    There is a timing hole during which the log reader thread does
    not detect that the worker thread is about to start publishing
    the transaction.  As a result the capture program internal
    memory for the transaction container is corrupted.  The symptoms
    of this problem include missing rows and/or duplicate rows at
    the target.  It can affect the spilled transaction as well as
    subsequent transactions.
    The defect occurs if all of the following conditions are met:
    - the Capture program has used almost all of its memory
    (controlled by MEMORY_LIMIT) to capture transactions
    - there is one captured committed transactions not yet published
    by the worker thread in this commit interval, this transaction
    is the largest transaction in memory, and the worker thread is
    just about to start publishing it
    - there is still more than 64KB of memory available before
    reaching MEMORY_LIMIT (otherwise, the log reader would sleep to
    give a chance to the worker to free up memory), the log reader
    thread reads more records from the log causing MEMORY_LIMIT to
    be exceeded, and starts spilling to disk the largest transaction
    in memory.
    - the worker thread starts publishing the transaction that the
    log reader thread is currently spilling
    This is a rare condition subject to I/O and processor speeds, if
    the worker gets to the transaction first, there is no problem.
    If the log reader thread finishes to spill the transaction
    before the worker thread publishes it, there is no problem.
    The fix is that Capture will only spill transactions for which a
    commit has not yet being read from the log; any completely
    captured transaction is not spilled, because once it is
    published and committed to MQ (for Q Replication) or to the DBMS
    (for SQL Replication) ,the memory will be freed.  Capture
    chooses to spill to disk the largest incomplete transaction in
    memory.  If there is no incomplete transaction to spill, but
    some transactions were captured, Capture sleeps repeatedly as
    per COMMIT_INTERVAL, until the published transactions are
    committed and space is freed.  If no transaction was captured,
    Capture will try to exceed its MEMORY_LIMIT budget, if there is
    no memory available from the system, Capture will come down with
    an error.
    But this fix will prevent any data loss or duplicate data.
    

Temporary fix

Comments

APAR Information

  • APAR number

    PM04195

  • Reported component name

    WS REPLICATION

  • Reported component ID

    5655L8800

  • Reported release

    910

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    YesHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2009-12-25

  • Closed date

    2010-01-08

  • Last modified date

    2010-02-01

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

    UK53447 UK53448 UK53449 UK53450

Modules/Macros

  •    ASNACMD  ASNADMSP ASNAPPLY ASNCAP   ASNCCMD
    ASNLOAD  ASNMCMD  ASNMIG8  ASNMON   ASNMONIT ASNPLXFY ASNQACMD
    ASNQAPP  ASNQCAP  ASNQCCMD ASNQDEP  ASNQMFMT ASNQXFMT ASNRBASE
    ASNTDIFF ASNTRC   ASN2BASE
    

Fix information

  • Fixed component name

    WS REPLICATION

  • Fixed component ID

    5655L8800

Applicable component levels

  • R910 PSY UK53447

       UP10/01/19 P F001

  • R911 PSY UK53448

       UP10/01/20 P F001

  • R912 PSY UK53449

       UP10/01/19 P F001

  • R913 PSY UK53450

       UP10/01/19 P F001

Fix is available

  • Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSDP5R","label":"InfoSphere Replication Server"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"910","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
01 February 2010