[SERVER-30487] RangeDeleter holds WT transaction open while waiting for majority Created: 02/Aug/17  Updated: 30/Oct/23  Resolved: 08/Aug/17

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.4.7
Fix Version/s: 3.4.9

Type: Bug Priority: Major - P3
Reporter: Bruce Lucas (Inactive) Assignee: Kaloian Manassiev
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Description   

At log level 5, plus additional debug messages on entry to and exit from _waitForMajority, we see 3 WT transactions (536, 537, 538) begun and commited for the 3 documents deleted, but then an additional WT transaction (540) is created and remains open while _waitForMajority is called, and then is rolled back.

2017-08-02T16:48:48.562-0400 I SHARDING [RangeDeleter] Deleter starting delete for: test.c from { _id: MinKey } -> { _id: MaxKey }, with opId: 358
2017-08-02T16:48:48.562-0400 D SHARDING [RangeDeleter] begin removal of { : MinKey } to { : MaxKey } in test.c with write concern: { w: 1, j: false, wtimeout: 0 }
 
2017-08-02T16:48:48.562-0400 D STORAGE  [RangeDeleter] WT begin_transaction for snapshot id 536
2017-08-02T16:48:48.563-0400 D STORAGE  [RangeDeleter] WT commit_transaction for snapshot id 536
2017-08-02T16:48:48.563-0400 D STORAGE  [RangeDeleter] WT begin_transaction for snapshot id 537
2017-08-02T16:48:48.563-0400 D STORAGE  [RangeDeleter] WT commit_transaction for snapshot id 537
2017-08-02T16:48:48.563-0400 D STORAGE  [RangeDeleter] WT begin_transaction for snapshot id 538
2017-08-02T16:48:48.563-0400 D STORAGE  [RangeDeleter] WT commit_transaction for snapshot id 538
 
2017-08-02T16:48:48.563-0400 D STORAGE  [RangeDeleter] WT begin_transaction for snapshot id 540
2017-08-02T16:48:48.563-0400 D SHARDING [RangeDeleter] end removal of { : MinKey } to { : MaxKey } in test.c (took 0ms)
2017-08-02T16:48:48.563-0400 I SHARDING [RangeDeleter] rangeDeleter deleted 3 documents for test.c from { _id: MinKey } -> { _id: MaxKey }
2017-08-02T16:48:48.563-0400 I SHARDING [RangeDeleter] xxx enter _waitForMajority
2017-08-02T16:48:48.580-0400 I SHARDING [RangeDeleter] xxx exit _waitForMajority
2017-08-02T16:48:48.580-0400 D STORAGE  [RangeDeleter] WT rollback_transaction for snapshot id 540

This can result in a very long running transaction if there is replication lag, which can result in the instance getting stuck with a full cache, and that can result in a stall of as much as an hour until the _waitForMajority times out.



 Comments   
Comment by Githook User [ 08/Aug/17 ]

Author:

{'name': 'Kaloian Manassiev', 'username': 'kaloianm', 'email': 'kaloian.manassiev@mongodb.com'}

Message: SERVER-30487 Make sharding range deleter yield the WT snapshot after each iteration
Branch: v3.4
https://github.com/mongodb/mongo/commit/ce41c7bb609db8f33c6b3a04547325d01605cbc8

Comment by Kaloian Manassiev [ 03/Aug/17 ]

From inspecting the 3.4 code, the reason range deletion holds the WT snapshot for long time is the lack of ScopedTransaction at each loop iteration, which is what invokes abandonSnapshot. Without this, the snapshot remains on the recovery unit for as long as it is alive.

Generated at Thu Feb 08 04:23:59 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.