Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-30487

RangeDeleter holds WT transaction open while waiting for majority

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: 3.4.7
    • Fix Version/s: 3.4.9
    • Component/s: Sharding
    • Labels:
      None
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL

      Description

      At log level 5, plus additional debug messages on entry to and exit from _waitForMajority, we see 3 WT transactions (536, 537, 538) begun and commited for the 3 documents deleted, but then an additional WT transaction (540) is created and remains open while _waitForMajority is called, and then is rolled back.

      2017-08-02T16:48:48.562-0400 I SHARDING [RangeDeleter] Deleter starting delete for: test.c from { _id: MinKey } -> { _id: MaxKey }, with opId: 358
      2017-08-02T16:48:48.562-0400 D SHARDING [RangeDeleter] begin removal of { : MinKey } to { : MaxKey } in test.c with write concern: { w: 1, j: false, wtimeout: 0 }
       
      2017-08-02T16:48:48.562-0400 D STORAGE  [RangeDeleter] WT begin_transaction for snapshot id 536
      2017-08-02T16:48:48.563-0400 D STORAGE  [RangeDeleter] WT commit_transaction for snapshot id 536
      2017-08-02T16:48:48.563-0400 D STORAGE  [RangeDeleter] WT begin_transaction for snapshot id 537
      2017-08-02T16:48:48.563-0400 D STORAGE  [RangeDeleter] WT commit_transaction for snapshot id 537
      2017-08-02T16:48:48.563-0400 D STORAGE  [RangeDeleter] WT begin_transaction for snapshot id 538
      2017-08-02T16:48:48.563-0400 D STORAGE  [RangeDeleter] WT commit_transaction for snapshot id 538
       
      2017-08-02T16:48:48.563-0400 D STORAGE  [RangeDeleter] WT begin_transaction for snapshot id 540
      2017-08-02T16:48:48.563-0400 D SHARDING [RangeDeleter] end removal of { : MinKey } to { : MaxKey } in test.c (took 0ms)
      2017-08-02T16:48:48.563-0400 I SHARDING [RangeDeleter] rangeDeleter deleted 3 documents for test.c from { _id: MinKey } -> { _id: MaxKey }
      2017-08-02T16:48:48.563-0400 I SHARDING [RangeDeleter] xxx enter _waitForMajority
      2017-08-02T16:48:48.580-0400 I SHARDING [RangeDeleter] xxx exit _waitForMajority
      2017-08-02T16:48:48.580-0400 D STORAGE  [RangeDeleter] WT rollback_transaction for snapshot id 540
      

      This can result in a very long running transaction if there is replication lag, which can result in the instance getting stuck with a full cache, and that can result in a stall of as much as an hour until the _waitForMajority times out.

        Attachments

          Activity

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: