-
Type:
Bug
-
Resolution: Fixed
-
Priority:
Major - P3
-
Affects Version/s: 8.3.0-rc0
-
Component/s: None
-
None
-
Catalog and Routing
-
Fully Compatible
-
ALL
-
CAR Team 2025-09-29
-
200
-
🟩 Routing and Topology
-
None
-
None
-
None
-
None
-
None
-
None
Cause
SERVER-103706 has moved the handling of the StaleConfig error in the strategy.cpp in favor of using only router role loops where needed.
This had the consequence of having the ShardsvrMoveRange, called by the ConfigSvrMoveRange to keep retrying against the same shard using the same range in case of StaleConfig.
Note that ShardsvrMoveRange issues StaleConfig by manually checking the range is fully owned by the shard, and not by checking the version as for every other command. Simply updating the version here is not enough, we need to update the entire ConfigSvrMoveRange request.
Example
Imagine 1 chunk in shard1 with no chunks on shard2
shard1: [min,max]
shard2: []
- we issue a moveRange (moveRange1) for moving the entire chunk on shard2
- a parallel moveRange (moveRange2) happens such that
shard1: [min,half]
shard2: [half,max]
the loop above for moveRange1 would keep trying using [min,max] against shard1, as the ConfigSvrMoveRange request doesn't change and the shard is chosen based on the min
Before The changes
Before the StaleConfig would've been propagated to the mongos, which would ve stop retrying and simply reported to the user that the requested range no longer exists
We should probably move that check within the ConfigSvrMoveRange.
- causes
-
SERVER-111458 Add 11089203 as an expected error in random_manual_migrations.js
-
- Closed
-
- is caused by
-
SERVER-103706 Get rid of the stale errors handling on strategy.cpp
-
- Closed
-
- is related to
-
SERVER-103706 Get rid of the stale errors handling on strategy.cpp
-
- Closed
-