-
Type:
Bug
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: 8.3.0-rc0, 8.3.0-rc1, 8.3.0-rc2
-
Component/s: None
-
None
-
Catalog and Routing
-
ALL
-
v8.3
-
None
-
None
-
None
-
None
-
None
-
None
-
None
This issue only impacts 8.3 since it is the first version where SPM-3729 is enabled.
Before SPM-3729
Former primary shards that were targeted by a stale router were: 1) installing a version on the DSS, if they didn't have one and 2) sending back that version to the router. Note that this version was potentially stale (i.e. indicating that another shard was the primary, for sure not them) and was useful to provide some hints to the routers of who could be the new primary shard. This behavior allowed us to resolve all the stale requests potentially with one refresh: all of them returned back to the router the same wanted version, the router did just one refresh, hitting the CSRS and after that resolved all the request with the new routing information.
After SPM-3729
With the new implementation some things changed: shards don't keep a stale view of who they believe is the primary shard: after ensuring that they are not the primary shard, they return a stale db error to the router, but without adding a wanted version. This causes an invalidation of the cache as part of the error handling in the router. With this new flow, we won't have a unique refresh solving all the requests but it will depend on the specific interleaving of routing invalidations and routing metadata refreshes from the CSRS.
Note that this inefficient resolution will only take place after: movePrimary or DropDB+Recreation.
Thanks to kaloian.manassiev@mongodb.com for pointing out the difference of behavior on the CatalogCache between collections and databases.
- is blocked by
-
SERVER-122485 Revert "Enable feature flag for SPM-3729" (Shards persist database metadata authoritatively)
-
- Closed
-
- is depended on by
-
SERVER-122739 Enable feature flag for SPM-3729
-
- Blocked
-