[SERVER-66031] Command(s) hang when specifying collectionUUID for unsharded collection on sharded cluster Created: 27/Apr/22 Updated: 29/Oct/23 Resolved: 16/May/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 6.0.0-rc6, 6.1.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Evgeni Dobranov | Assignee: | Pierlauro Sciarelli |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||
| Backport Requested: |
v6.0
|
||||||||||||||||||||||||
| Steps To Reproduce: |
^ We can confirm that the collection doesn't exist on the participant shard since the error contains `actualCollection: null` |
||||||||||||||||||||||||
| Sprint: | Sharding EMEA 2022-05-16, Sharding EMEA 2022-05-30 | ||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Linked BF Score: | 197 | ||||||||||||||||||||||||
| Description |
|
There's a hanging issue when specifying the collectionUUID parameter to a Rename for an unsharded collection on a sharded cluster with participant shard(s). There could be other commands with logic similar to Rename that are affected (e.g. possibly Drop and ShardCollection) but this was only attempted and observed with Rename. From a conversation with max.hirschhorn@mongodb.com, it seems like the problem lies in Rename's policy to broadcast the _shardsvrRenameCollectionParticipant command with the expectedSourceUUID and expectedTargetUUID to all shards even when those shards don't own any data for the collection. Upon issuing a Rename, the unsharded collection's UUID is successfully found on the coordinator shard in the kCheckPreconditions phase, which then allows us to continue to the kFreezeMigrations and kBlockCrudAndRename phases. The coordinator shard sends the Rename to participant(s) as-is with the collectionUUID parameter, which will always result in a CollectionUUIDMismatch error since the collection is unsharded and therefore doesn't exist on the participant. The coordinator shard must retry _shardsvrRenameCollectionParticipant until it succeeds on all of the participants to avoid the Rename succeeding partially only on some shards. But the Rename won't ever succeed when the collection doesn't exist on the participant and the collectionUUID parameter has been specified. |
| Comments |
| Comment by Githook User [ 17/May/22 ] |
|
Author: {'name': 'Pierlauro Sciarelli', 'email': 'pierlauro.sciarelli@mongodb.com', 'username': 'pierlauro'}Message: |
| Comment by Githook User [ 16/May/22 ] |
|
Author: {'name': 'Pierlauro Sciarelli', 'email': 'pierlauro.sciarelli@mongodb.com', 'username': 'pierlauro'}Message: |
| Comment by Githook User [ 16/May/22 ] |
|
Author: {'name': 'Pierlauro Sciarelli', 'email': 'pierlauro.sciarelli@mongodb.com', 'username': 'pierlauro'}Message: |
| Comment by Wenbin Zhu [ 27/Apr/22 ] |
|
I think we will need this fix for 6.0.0 |
| Comment by Max Hirschhorn [ 27/Apr/22 ] |
|
Thanks evgeni.dobranov@mongodb.com, I edited the description to clarify the names of the phases for RenameCollectionCoordinator slightly. I suspect the solution here will be to not forward collectionUUID, expectedSourceUUID, expectedTargetUUID, etc. to the participant shards because the coordinator shard as the primary shard can be authoritative about the collection existing, or existing with a different namespace string, or not existing. CC tommaso.tocci@mongodb.com |
| Comment by Evgeni Dobranov [ 27/Apr/22 ] |
|
max.hirschhorn@mongodb.com feel free to clarify on anything I wrote above / edit the description directly if I got any of the details wrong |