-
Type:
Bug
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: 8.3.0-rc0, 8.2.0
-
Component/s: None
-
None
-
Catalog and Routing
-
ALL
-
-
CAR Team 2025-12-08
-
🟩 Routing and Topology
-
None
-
None
-
None
-
None
-
None
-
None
The explain command for write operations (e.g. insert/delete/update) on legacy tracked timeseries executed from a router with a stale cache could fail with StaleConfig exception after extinguish the 10 retries.
At the high level this is what happens:
- The stale router things the collection is a tracked legacy timeseries
- It forward the command to the shard(s) using the timeseries view namespace but attaching the shard version from the associated timeseries buckets collection.
- The shard receive the command and check the shard version against the received namespace (timeseries view). Since the collection have been dropped and recreated as a normal collection it sends a StaleVersion to the router informing them that the view namespace is now unsharded.
- The router receives the StaleVersion error, perform a refresh of the view timeseries namespace but not of the buckets namespace and retry the operation starting from the beginning.
The solution to this is to make the shard always send the correct namespace and shard version along with the command. In this case since the router thinks the collection is a tracked legacy timeseries it should convert the command to target the system.buckets namespace and forward the translated commands to the shard(s) along with the shard version of the system.buckets namespace. If we do this the shard will correctly send a StaleVersion error for the system.buckets namespace, the router will refresh the cache associated to the system.buckets namespace and they will eventually agree on the shard version of the collection.
- is related to
-
SERVER-113997 Fix legacy timeseries namespace translation in findAndModify explain command
-
- Closed
-