[SERVER-42634] RSM can receive requests (that won't be failed) after drop() Created: 05/Aug/19  Updated: 29/Oct/23  Resolved: 09/Aug/19

Status: Closed
Project: Core Server
Component/s: Internal Code
Affects Version/s: None
Fix Version/s: 4.3.1

Type: Bug Priority: Major - P3
Reporter: Mira Carey Assignee: Mira Carey
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
is duplicated by SERVER-42633 SetState::drop() doesn't cancel outst... Closed
Related
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Service Arch 2019-08-12, Service Arch 2019-08-26
Participants:
Linked BF Score: 39

 Description   

The RSM theoretically has a contract that all waiter-promises are fulfilled with an unsatisfied read preference error on drop(). It also supposes that after being removed from the RSM Manager that no new promises can be added.

But there's a race that goes something like this:

getHostOrRefreshCaller RSMM::shutdown()
proceed to just before SetState::mutex acquisition, i.e. after check for isRemoved  
  swap monitors out and mark in shutdown + drop anchor
acquire lock and emplace waiter  
  shutdown and join task executor

This can lead to a situation where we add waiter to the rsm after a drop (and with no obvious path forward towards not needing to break that added promise)

The fix is to:

  • change removal from the monitor manager into a drop() on the RSM
  • pivot from isRemovedFromManager to isDropped, and update isDropped under the setState mutex
  • check for isDropped instead of globalRSMonitorManager.isShutdown in notify

This should ensure that removing a RSM from a manager is the same as dropping one and that drops serialize with calls to get (so that waiters can't race in during shutdown).



 Comments   
Comment by Githook User [ 09/Aug/19 ]

Author:

{'name': 'Jason Carey', 'email': 'jcarey@argv.me', 'username': 'hanumantmk'}

Message: SERVER-42634 RSM can strand requests after drop()

The RSM theoretically has a contract that all waiter-promises are
fulfilled with an unsatisfied read preference error on drop(). It also
supposes that after being removed from the RSM Manager that no new
promises can be added.

But there's a race that goes something like this:

getHostOrRefreshCaller RSMM::shutdown()

=======================================================================

proceed to just before  
SetState::mutex acquisition,  
i.e. after check for isRemoved  
  swap monitors out and mark in
  shutdown + drop anchor
acquire lock and emplace waiter  
  shutdown and join task executor

=======================================================================

This can lead to a situation where we add waiter to the rsm after a drop
(and with no obvious path forward towards not needing to break that
added promise)

The fix is to:

  • change removal from the monitor manager into a drop() on the RSM
  • pivot from isRemovedFromManager to isDropped, and update isDropped
    under the setState mutex
  • check for isDropped instead of globalRSMonitorManager.isShutdown in
    notify

This should ensure that removing a RSM from a manager is the same as
dropping one and that drops serialize with calls to get (so that waiters
can't race in during shutdown).
Branch: master
https://github.com/mongodb/mongo/commit/e5b4e288262e99644d8ff1627565dcbb3e94b6a1

Generated at Thu Feb 08 05:01:01 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.