[SERVER-40318] Condition variable wait in NamespaceSerializer::lock is not exception safe Created: 22/Mar/19  Updated: 29/Oct/23  Resolved: 08/Apr/19

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 4.1.10, 4.0.10

Type: Bug Priority: Major - P3
Reporter: Jack Mulrow Assignee: Janna Golden
Resolution: Fixed Votes: 0
Labels: sharding-wfbf-day
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.0
Sprint: Sharding 2019-04-22
Participants:
Linked BF Score: 5

 Description   

The NamespaceSerializer is essentially an in-memory cache of the distributed lock meant to synchronize sharded metadata operations that must run on the config server primary, like _configsvrCreateCollection and _configsvrDropCollection. Roughly, the class works like this:

  1. Threads wishing to lock a namespace call NamespaceSerializer::lock() which takes a class mutex.
  2. Inside, it checks a map of objects containing a condition variable, a waiters counter, and an inProgress boolean for an existing entry for that namespace.
    1. If there is no entry, one is created with a new condition variable, a waiters counter of 1, and inProgress boolean of true.
    2. If there is one, the thread increments its waiters count and waits on its condition variable for inProgress to be false, setting it to true once it can proceed.
  3. After this, the method returns a ScopedLock object which decrements the waiters, sets inProgress to false, and calls notify_one() on the condition variable in its destructor.
    1. If the waiters counter is 0, the entry for the namespace is removed from the map.

The condition variable wait and waiters counter increment happens before the ScopedLock object is created and the wait is interruptible, so a request with maxTimeMS (or one that is killed) may throw after increasing the counter but without correspondingly decrementing it in the ScopedLock destructor, so the counter can never reach 0 and the entry for the namespace will never be removed.

Interestingly, the condition variable's condition will be correct once the ScopedLock the interrupted request was waiting on is destructed (because inProgress is set to false), so the next attempt to lock the serializer should succeed without waiting, but because the destructor uses notify_one, if there was more than one thread waiting on the lock and the interrupted request was the one signaled, the other waiter(s) will hang.



 Comments   
Comment by Luke Chen [ 11/Apr/19 ]

Fixing up fixversion as this ticket was not included as part of 4.0.9 release.

Comment by Githook User [ 09/Apr/19 ]

Author:

{'email': 'golden.janna@gmail.com', 'name': 'jannaerin', 'username': 'jannaerin'}

Message: SERVER-40318 Make condition variable wait in NamespaceSerializer exception safe

(cherry picked from commit 07bcfd825c6ad2c347329af1a1b7634029048871)
Branch: v4.0
https://github.com/mongodb/mongo/commit/73536c31314daef6c68217aed5f8d6ddd432d15b

Comment by Githook User [ 08/Apr/19 ]

Author:

{'email': 'golden.janna@gmail.com', 'name': 'jannaerin', 'username': 'jannaerin'}

Message: SERVER-40318 Make condition variable wait in NamespaceSerializer exception safe
Branch: master
https://github.com/mongodb/mongo/commit/07bcfd825c6ad2c347329af1a1b7634029048871

Generated at Thu Feb 08 04:54:37 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.