Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 4.1.10, 4.0.10
Affects Version/s: None
Component/s: Sharding
Labels:
- sharding-wfbf-day

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Requested:

v4.0
Sprint:
Sharding 2019-04-22
Linked BF Score:
5
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

The NamespaceSerializer is essentially an in-memory cache of the distributed lock meant to synchronize sharded metadata operations that must run on the config server primary, like _configsvrCreateCollection and _configsvrDropCollection. Roughly, the class works like this:

Threads wishing to lock a namespace call NamespaceSerializer::lock() which takes a class mutex.
Inside, it checks a map of objects containing a condition variable, a waiters counter, and an inProgress boolean for an existing entry for that namespace.
1. If there is no entry, one is created with a new condition variable, a waiters counter of 1, and inProgress boolean of true.
2. If there is one, the thread increments its waiters count and waits on its condition variable for inProgress to be false, setting it to true once it can proceed.
After this, the method returns a ScopedLock object which decrements the waiters, sets inProgress to false, and calls notify_one() on the condition variable in its destructor.
1. If the waiters counter is 0, the entry for the namespace is removed from the map.

The condition variable wait and waiters counter increment happens before the ScopedLock object is created and the wait is interruptible, so a request with maxTimeMS (or one that is killed) may throw after increasing the counter but without correspondingly decrementing it in the ScopedLock destructor, so the counter can never reach 0 and the entry for the namespace will never be removed.

Interestingly, the condition variable's condition will be correct once the ScopedLock the interrupted request was waiting on is destructed (because inProgress is set to false), so the next attempt to lock the serializer should succeed without waiting, but because the destructor uses notify_one, if there was more than one thread waiting on the lock and the interrupted request was the one signaled, the other waiter(s) will hang.

Assignee:: Janna Golden
Reporter:: Jack Mulrow
Participants:: Githook User, Jack Mulrow, Janna Golden, Luke Chen
Votes:: 0 Vote for this issue
Watchers:: 5 Start watching this issue

Created:: Mar 22 2019 10:37:53 PM UTC
Updated:: Oct 29 2023 10:22:40 PM UTC
Resolved:: Apr 08 2019 09:30:43 PM UTC
Confidence Status Last Update:: 08/Apr/19 3:05 PM

Details

Description

Attachments

Forms

Activity

People

Dates