Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 6.2.0-rc0
Affects Version/s: 6.2.0-rc0
Component/s: Sharding
Labels:
None

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Sprint:
Sharding EMEA 2022-10-31, Sharding EMEA 2022-11-14
Linked BF Score:
135
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

As part of SERVER-65016 a new code to remove the range deletion document was added as an optimization into the existing drop collection code, however, this code is using an alternative client region in order to remove multiple documents, this is done because the shardsvr_drop_collection_participant uses the retryable write machinery to guard against replay protection.

The unintended effect of this is that a thread that is dropping a collection will first checkout a session, and then, as part of taking the collection lock it will try to grab the RSTL lock when executing the DBClient command to remove the range deletion documents. If a stepdown sneaks in after the session is checked out, then the stepdown thread will grab the RSTL lock and then try to checkout and kill all running sessions, causing a deadlock.

In the attached stacktrace log this situation can be seen between the Thread 2 and Thread 99. One way to solve this is to do create the operation context the same way the rename collection metadata command does, which is, linking the new operation context created in the alternative region to the parent cancellation token, this way, during the stepdown, when the parent operation context is interrupted, the thread waiting for the lock will finish, liberating the session, allowing the shutdown thread to effectively checking it out.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

BFG-1553779.log
1.17 MB
Oct 26 2022 05:17:43 PM UTC

is caused by

SERVER-65016 Remove range deletions as part of `dropCollection`

Closed

related to

SERVER-60161 Deadlock between config server stepdown and _configsvrRenameCollectionMetadata command

Closed

Assignee:: Marcos José Grillo Ramirez
Reporter:: Marcos José Grillo Ramirez
Participants:: Githook User, Marcos José Grillo Ramirez
Votes:: 0 Vote for this issue
Watchers:: 3 Start watching this issue

Created:: Oct 26 2022 05:24:13 PM UTC
Updated:: Oct 29 2023 09:31:26 PM UTC
Resolved:: Nov 04 2022 12:36:12 PM UTC
Confidence Status Last Update:: 27/Oct/22 5:10 PM

Details

Description

Attachments

Attachments

Issue Links

Forms

Activity

People

Dates