[SERVER-70118] Handle collection sharding state change during _clusterQueryWithoutShardKey Created: 29/Sep/22  Updated: 29/Oct/23  Resolved: 08/May/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 7.1.0-rc0

Type: Task Priority: Major - P3
Reporter: Jason Zhang Assignee: Sanika Phanse (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-76804 Operation hangs when collection dropp... Closed
is related to SERVER-77031 Add test cases for write one on colle... Closed
Assigned Teams:
Sharding NYC
Backwards Compatibility: Fully Compatible
Sprint: Sharding NYC 2023-02-20, Sharding NYC 2023-03-06, Sharding NYC 2023-03-20, Sharding NYC 2023-04-03, Sharding NYC 2023-04-17, Sharding NYC 2023-05-01, Sharding NYC 2023-05-15
Participants:

 Description   

It's possible that maybe the routing table gets flushed or becomes stale through move chunks or a drop collection, which could cause the chunk manager on the clusterQueryWithoutShardKey to return NamespaceNotFound.

In the case where a collection is dropped before the command is run, uasserting InvalidOptions when checking if the collection is sharded may prevent retrying on StaleConfig exceptions. It may be necessary to adjust the error handling here.



 Comments   
Comment by Githook User [ 08/May/23 ]

Author:

{'name': 'Sanika Phanse', 'email': 'sanika.phanse@mongodb.com', 'username': 'sphanse99'}

Message: SERVER-70118 Handle collection sharding state change during _clusterQueryWithoutShardKey
Branch: master
https://github.com/mongodb/mongo/commit/eeff988cb9d06d713de7aae1aa16a7bea4f80003

Comment by Sanika Phanse (Inactive) [ 28/Mar/23 ]

Discussed offline:

We will introduce a new error code here that is not a TransientTransactionError so that we exit the internal transaction. We will convert the error to a StaleConfig here so that the outer findAndModify command is retried.

Comment by Sanika Phanse (Inactive) [ 28/Feb/23 ]

Splitting this ticket into two:
This ticket is responsible for testing / handling two scenarios:
1) If the routing table gets flushed by the flushRouterConfig command between the cri snapshot in findAndModify and the cri snapshot in _clusterQueryWithoutShardKey, the routing table gets correctly refreshed.
2) If a collection changes from sharded to unsharded between findAndModify's cri snapshots, we throw StaleConfig here, and ensure this error is propagated.

https://jira.mongodb.org/browse/SERVER-74441 is the precursor to this ticket.

Comment by Sanika Phanse (Inactive) [ 27/Feb/23 ]

cm.isSharded() might change to false between clusterQuery invocation and clusterQuery execution

change InvalidOptions to StaleConfig to trigger retry.

Generated at Thu Feb 08 06:15:18 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.