[SERVER-67725] Check catalog consistency on shards as precondition for rename Created: 01/Jul/22  Updated: 29/Oct/23  Resolved: 02/Aug/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 5.0.13, 6.0.2, 6.1.0-rc0

Type: Task Priority: Major - P3
Reporter: Pierlauro Sciarelli Assignee: Enrico Golfieri
Resolution: Fixed Votes: 0
Labels: shardingemea-qw
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Related
Backwards Compatibility: Fully Compatible
Backport Requested:
v6.0, v5.0
Sprint: Sharding EMEA 2022-07-25, Sharding EMEA 2022-08-08
Participants:
Case:
Story Points: 2

 Description   

In order to ensure the correctness of renameCollection for sharded collections (supported since v5.0), we introduced some logic in rename coordinator/participant to make sure UUIDs are aligned across all shards.
If a catalog inconsistency is detected (namely different UUIDs for the source/target collection on different shards), the rename operation hangs spamming the logs with a message aimed to push the user to manual intervene.

This is an example of error emitted in the logs:

{"t":{"$date":"2022-05-21T00:37:40.719Z"},"s":"E","c":"SHARDING","id":6372200,"ctx":"RenameCollectionParticipantService-223","msg":"Error executing rename collection participant. Going to be retried.","attr":{"fromNs":"foo.sourceColl","toNs":"foo.TargetColl","error":"CommandFailed: Source Collection foo.sourceColl UUID does not match provided uuid."}}

Given that a bunch of users hit the error but got their collection stuck not knowing how to fix the catalog inconsistency, purpose of this ticket is to prevent ending up in this situation.

A possible way would be to broadcast a message to all shards in the checkPreconditions phase in order to early fail the operation in case an inconsistency is detected. (E.g. call a listCollections filtered by ns on all shards).

This would not fully prevent the hang to happen because after checking preconditions and before instantiating participants some direct client could create the source/target collection with different UUIDs on other shards. But the time frame for the bad interleaving will be so short to prevent 99% of the hangs.



 Comments   
Comment by Githook User [ 01/Sep/22 ]

Author:

{'name': 'Enrico Golfieri', 'email': 'enrico.golfieri@mongodb.com', 'username': 'enricogolfieri'}

Message: SERVER-67725 Check catalog consistency on shards as precondition for rename (cherry picked from e249de58449ebc1d3599b44c26dcdb342376b413)
Branch: v5.0
https://github.com/mongodb/mongo/commit/e47b338c23a4ef576d62f7bae7bf6895a2db693e

Comment by Githook User [ 29/Jul/22 ]

Author:

{'name': 'Enrico Golfieri', 'email': 'enrico.golfieri@mongodb.com', 'username': 'enricogolfieri'}

Message: SERVER-67725 check uuid consistency over all participants for renameCollection
Branch: master
https://github.com/mongodb/mongo/commit/e249de58449ebc1d3599b44c26dcdb342376b413

Generated at Thu Feb 08 06:08:53 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.