The RSM theoretically has a contract that all waiter-promises are fulfilled with an unsatisfied read preference error on drop(). It also supposes that after being removed from the RSM Manager that no new promises can be added.
But there's a race that goes something like this:
|proceed to just before SetState::mutex acquisition, i.e. after check for isRemoved|
|swap monitors out and mark in shutdown + drop anchor|
|acquire lock and emplace waiter|
|shutdown and join task executor|
This can lead to a situation where we add waiter to the rsm after a drop (and with no obvious path forward towards not needing to break that added promise)
The fix is to:
- change removal from the monitor manager into a drop() on the RSM
- pivot from isRemovedFromManager to isDropped, and update isDropped under the setState mutex
- check for isDropped instead of globalRSMonitorManager.isShutdown in notify
This should ensure that removing a RSM from a manager is the same as dropping one and that drops serialize with calls to get (so that waiters can't race in during shutdown).