[SERVER-33313] add "fromMovePrimary" flag to CursorManager::invalidateAll() and call it when entering the movePrimary critical section Created: 13/Feb/18  Updated: 27/Oct/23  Resolved: 18/Apr/18

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Esha Maharishi (Inactive) Assignee: Janna Golden
Resolution: Works as Designed Votes: 0
Labels: todo_in_code
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Sprint: Sharding 2018-04-09, Sharding 2018-04-23
Participants:

 Description   

When the "fromMovePrimary" flag is true, CursorManager::invalidateAll() should set the Status for all cursors it kills to StaleDbVersion rather than QueryPlanKilled (which is what it currently sets).

Then, CursorManager::invalidateAll() with "fromMovePrimary=true" should be called when entering the movePrimary critical section. This will ensure that when the unsharded collections that were moved are dropped at the end of movePrimary, any yielded readers and writers on those unsharded collections throw StaleDbVersion (and so are retried against the new primary shard) rather than QueryPlanKilled (which would not cause them to be retried).


Note, we do not want to call CursorManager::invalidateAll() when entering the moveChunk critical section, because we do not want to kill yielded readers on sharded collections (CursorManager::invalidateAll() kills both readers and writers). This is because readers on a sharded collection hold a ScopedCollectionMetadata, which prevents the RangeDeleter from deleting the data out from under them.

Since we can't use CursorManager::invalidateAll() on entering the moveChunk critical section because we don't want to kill yielded readers, we will continue to call checkShardVersion() in OpObservers to cause yielded writers to throw StaleShardVersion if they resume after a moveChunk critical section is entered.



 Comments   
Comment by Janna Golden [ 18/Apr/18 ]

All of the commands that send a db version either do not yield or do not have a plan executor, so closing as works as designed.

Comment by Charlie Swanson [ 14/Feb/18 ]

Instead of adding a 'fromMovePrimary' boolean, I think it would be clearer to have callers of invalidateAll pass either an error code or a Status indicating why the cursors/PlanExecutors are being invalidated. Callers already pass a reason (string) and this change motivates a custom error code, so a Status seems best.

Generated at Thu Feb 08 04:33:00 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.