-
Type:
Bug
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: 8.0.0, 8.2.0
-
Component/s: None
-
None
-
Query Execution
-
ALL
-
-
None
-
None
-
None
-
None
-
None
-
None
-
None
It could happen that write operations that use two-phase protocol fails with NamespaceNotSharded error when executed against an unsharded collections through a stale router.
The two-phase protocol is used for write operations (updates and deletes) that cannot be directly targeted to a single shard.
The problem happens when the router that serve the write is stale, thinks the collection is sharded, decide to use the two phase write protocol. However, when executing it, ClusterQueryWithoutShardKey receives a StaleInfo error from the shard, it will refresh its cache, retry the two-phase protocol, and finally fail with "NamespaceNotSharded".
The problem is that the ClusterQueryWithoutShardKey command implement a router loop that swallow (intercepts and retry) the StaleInfo error. Instead, the error should be bubble up to the write executor so that after refreshing the cache and restarting the operation, it will decide to use the correct write protocol (single-phase vs two-phase) according to the refreshed metadata info.
After the first failure, if the write operation is executed again it will succeed because the cache have been already updated.
- is depended on by
-
SERVER-114458 Disallow nesting of RouterRoles for the same namespaces
-
- Backlog
-
- related to
-
SERVER-99481 Enhance test suites that test stale routers
-
- Backlog
-