-
Type:
Bug
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: 7.0.0, 8.0.0, 8.2.0
-
Component/s: None
-
Catalog and Routing
-
ALL
-
0
-
馃煩 Routing and Topology
-
None
-
None
-
None
-
None
-
None
-
None
In v8.2 and below it's possible to have the following situation where timeseries $out targeting a different DB does not converge and ends up bubbling StaleConfig to the user:
- Database "targetDb" exists on Shard1.
- However we have a stale router that believes "targetDb" is on Shard0.
- A timeseries $out from "sourceDb.sourceColl" to "targetDb.myTs" on the stale router heuristically decides to execute the $out on Shard0 since that's where it believes targetDb is. That's just a missed optimization.
- The $out runs almost to completion until it has to create the timeseries view on targetDb.myTs.
- The creation on Shard1 ends up bubbling a StaleConfig error because:
- Creating a legacy timeseries collection does two ShardVersion checks (one for the buckets NSS and one for the view NSS) and both can throw StaleConfig.
- The Shard Role loop on the ServiceEntryPoint will only refresh and retry once.
- The routing on cluster::createCollection is a DBPrimaryRouter so it won't retry StaleConfig.
- This error tears down and causes a retry of the entire $out. Neither the stale router has learned the placement of targetDb nor Shard1 has discovered the filtering metadata for targetDb.myTs, so all retries fail similarly until the max retries are exhausted and StaleConfig is bubbled up to the user.
聽
Notes:
- This precise stale router setup is required so that the Shard1 never learns the filtering metadata for targetDb.myTs and keeps failing twice on (5.1).
- On v8.3+ this is fixed by
SERVER-77402which allows multiple retries on (5.2).
聽
A reproducer is attached.
The proper fix is likely to do a very targeted backport of SERVER-77402 to v8.2 and lower.
- is related to
-
SERVER-77402 Replace instances of the ShardRole retry loop pattern with a unified implementation
-
- Closed
-
- related to
-
SERVER-123636 Disable multirouting on timeseries_out_non_sharded.js cross-DB test cases
-
- Closed
-