Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-53376

[4.4] dbHash can live lock an aborting index build

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 4.4.4
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • Fully Compatible
    • ALL
    • Execution Team 2021-01-11
    • 22

      dbHash is allowed to hold open storage snapshots indefinitely while waiting for collection locks. Multi-doc transactions do this, but they have lock acquisition deadlines.

      This behavior introduces a very specific live lock in 4.4:

      • dbHash opens a read snapshot using the current cluster time
      • An index build aborts. While holding an X collection lock, the index build attempts to set a ghost commit timestamp for the catalog write using the same cluster time.
      • Due to an assertion in WT, setting the ghost timestamp will fail because there is an open transaction (dbHash) reading at the same timestamp.
      • The index build retries indefinitely, waiting for the dbHash reader to finish.
      • The dbHash operation is unable to make progress because it is blocked by the X lock.

      In general, I believe we should impose a lock timeout such that dbHash cannot hold open snapshots and block indefinitely, much like we already do for multi-document transactions.

      The alternative to imposing a lock deadline would be to fix ghost timestamps, but only in 4.4. I believe a dbHash change will avoid the risk of modifying 4.4 index build code that has been removed in master. Adding a lock timeout to dbHash assumes there are no other consequences of the index build ghost timestamping behavior. This same bug applies to background validation, which will need to undergo the same lock timeout change (SERVER-53445).

            Assignee:
            louis.williams@mongodb.com Louis Williams
            Reporter:
            louis.williams@mongodb.com Louis Williams
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: