[SERVER-66031] Command(s) hang when specifying collectionUUID for unsharded collection on sharded cluster Created: 27/Apr/22  Updated: 29/Oct/23  Resolved: 16/May/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 6.0.0-rc6, 6.1.0-rc0

Type: Bug Priority: Major - P3
Reporter: Evgeni Dobranov Assignee: Pierlauro Sciarelli
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Problem/Incident
Related
is related to SERVER-66474 Consider increasing number of shards ... Backlog
is related to SERVER-62455 Add collectionUUID parameter to renam... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v6.0
Steps To Reproduce:
  1. Create a sharded cluster with >= 2 shards.
  2. Create an unsharded collection on the primary shard.
  3. Get the unsharded collection's UUID.
  4. Issue a Rename command to rename the collection to something else while specifying the collectionUUID parameter with the UUID from step 3.
  5. The command will hang. A server log line like this can be observed about once every second on a participant shard:

{"t":\{"$date":"2022-04-27T16:15:22.026-04:00"},"s":"E", "c":"SHARDING", "id":6372200, "ctx":"RenameCollectionParticipantService-5","msg":"Error executing rename collection participant. Going to be retried.","attr":\{"fromNs":"myDB.B","toNs":"myDB.A","error":"CollectionUUIDMismatch{ db: \"myDB\", collectionUUID: UUID(\"08d3be0f-f7e4-402e-b6ac-c6998299df57\"), expectedCollection: \"B\", actualCollection: null }: Collection UUID does not match that specified"}}

^ We can confirm that the collection doesn't exist on the participant shard since the error contains `actualCollection: null`

Sprint: Sharding EMEA 2022-05-16, Sharding EMEA 2022-05-30
Participants:
Linked BF Score: 197

 Description   

There's a hanging issue when specifying the collectionUUID parameter to a Rename for an unsharded collection on a sharded cluster with participant shard(s). There could be other commands with logic similar to Rename that are affected (e.g. possibly Drop and ShardCollection) but this was only attempted and observed with Rename.

From a conversation with max.hirschhorn@mongodb.com, it seems like the problem lies in Rename's policy to broadcast the _shardsvrRenameCollectionParticipant command with the expectedSourceUUID and expectedTargetUUID to all shards even when those shards don't own any data for the collection. Upon issuing a Rename, the unsharded collection's UUID is successfully found on the coordinator shard in the kCheckPreconditions phase, which then allows us to continue to the kFreezeMigrations and kBlockCrudAndRename phases. The coordinator shard sends the Rename to participant(s) as-is with the collectionUUID parameter, which will always result in a CollectionUUIDMismatch error since the collection is unsharded and therefore doesn't exist on the participant. The coordinator shard must retry _shardsvrRenameCollectionParticipant until it succeeds on all of the participants to avoid the Rename succeeding partially only on some shards. But the Rename won't ever succeed when the collection doesn't exist on the participant and the collectionUUID parameter has been specified.



 Comments   
Comment by Githook User [ 17/May/22 ]

Author:

{'name': 'Pierlauro Sciarelli', 'email': 'pierlauro.sciarelli@mongodb.com', 'username': 'pierlauro'}

Message: SERVER-66031 fix fcv document retrieval in jstests/sharding/rename_sharded.js
Branch: v6.0
https://github.com/mongodb/mongo/commit/76fce4412c973c738a02333813dc394191a826f1

Comment by Githook User [ 16/May/22 ]

Author:

{'name': 'Pierlauro Sciarelli', 'email': 'pierlauro.sciarelli@mongodb.com', 'username': 'pierlauro'}

Message: SERVER-66031 rename must succeed on all shards when UUIDs provided for C2C
Branch: v6.0
https://github.com/mongodb/mongo/commit/feb2dfc188dd4108224c85ab03b0dd0d7ceaa8ea

Comment by Githook User [ 16/May/22 ]

Author:

{'name': 'Pierlauro Sciarelli', 'email': 'pierlauro.sciarelli@mongodb.com', 'username': 'pierlauro'}

Message: SERVER-66031 rename must succeed on all shards when UUIDs provided for C2C
Branch: master
https://github.com/mongodb/mongo/commit/677d03ddb3fb888f456624c00fbaef7cb593c979

Comment by Wenbin Zhu [ 27/Apr/22 ]

I think we will need this fix for 6.0.0

cc lingzhi.deng@mongodb.com 

Comment by Max Hirschhorn [ 27/Apr/22 ]

Thanks evgeni.dobranov@mongodb.com, I edited the description to clarify the names of the phases for RenameCollectionCoordinator slightly. I suspect the solution here will be to not forward collectionUUID, expectedSourceUUID, expectedTargetUUID, etc. to the participant shards because the coordinator shard as the primary shard can be authoritative about the collection existing, or existing with a different namespace string, or not existing. CC tommaso.tocci@mongodb.com

Comment by Evgeni Dobranov [ 27/Apr/22 ]

max.hirschhorn@mongodb.com feel free to clarify on anything I wrote above / edit the description directly if I got any of the details wrong

(cc wenbin.zhu@mongodb.com gregory.noma@mongodb.com)

Generated at Thu Feb 08 06:04:17 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.