Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-8924

Don't check against on disk time window if there is an insert list when checking for conflicts in row-store

    • 5
    • Storage - Ra 2022-03-21
    • v5.3, v5.0, v4.4

      Issue Status as of May 2, 2022

      This issue in MongoDB 4.4.10 to 4.4.13 and 5.0.4 to 5.0.7 may cause replication to stall on secondary replica set members in a sharded cluster handling cross-shard transactions.

      The bug is triggered when WiredTiger erroneously returns a write conflict when deciding if an update to a record is allowed. If MongoDB decides to retry the operation that caused the conflict in WiredTiger, it will enter an indefinite retry loop, and oplog application will stall on secondary nodes.

      A MongoDB cluster may be affected by this bug if:

      • the cluster is sharded
      • the application uses cross-shard transactions
      • the cluster is using versions 4.4.10 to 4.4.13 or 5.0.4 to 5.0.7 on secondary nodes

      If the bug is triggered, the cluster's secondary nodes will experience indefinite growth in replication lag.

      Secondary nodes that have replication stalled may be restarted to resume replication.

      This issue is fixed in MongoDB 4.4.14 and 5.0.8.

      Original Description

      While implementing FLCS related changes in WT-8019 a change was made to stop checking if the insert list on the cbt was null prior to checking against the on disk time window. This change may be correct for FLCS but isn't correct for row-store.

      This is only a problem if the cbt->slot isn't unset or UINT32_MAX. It's possible that an alternative solution would be to clear the cbt slot on an insert list row search however that is still open for discussion.

            luke.pearson@mongodb.com Luke Pearson
            luke.pearson@mongodb.com Luke Pearson
            0 Vote for this issue
            18 Start watching this issue