[SERVER-76624] Server can falsely report CollectionUUIDMismatch Created: 27/Apr/23 Updated: 01/Feb/24 Resolved: 01/Feb/24 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Michael McClimon | Assignee: | Jordi Olivares Provencio |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | car-qw | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||
| Assigned Teams: |
Catalog and Routing
|
||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||
| Sprint: | Execution NAMR Team 2023-07-10, Sharding EMEA 2023-07-24, Sharding EMEA 2023-08-07, Sharding EMEA 2023-08-21, Sharding EMEA 2023-09-04, Sharding EMEA 2023-09-18, Sharding EMEA 2023-10-02, Sharding EMEA 2023-10-16, Sharding EMEA 2023-10-30, CAR Team 2023-11-13, CAR Team 2023-11-27, CAR Team 2023-12-11, CAR Team 2023-12-25, CAR Team 2024-01-08, CAR Team 2024-02-05 | ||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Story Points: | 2 | ||||||||||||||||||||
| Description |
|
In a mongosync test run, mongod mistakenly reported a collectionUUIDMismatch error. The sequence of events was:
The mongod returns this very confusing error, with the same UUID and the same collection name in both expected and actual collection:
I collected some of the relevant logs in this GitHub gist, but the full mongod logs are available from the Evergreen task page for the failed mongosync tests. |
| Comments |
| Comment by Jordi Olivares Provencio [ 01/Feb/24 ] | ||||||||||
|
rohan.sharan@mongodb.com I can more or less confirm that if you were to use afterClusterTime to enforce causal consistency this issue would be fixed. Closing this as Won't Fix as there's nothing else on our end to solve. | ||||||||||
| Comment by Rohan Sharan [ 19/Jul/23 ] | ||||||||||
|
Yeah actually apparently it uses read preference nearest by default on the source cluster, which is where SampleCollection is being run. It doesn't use causal consistency, and it never would be able to run source collection creation and the aggregation in the same session, because the source collection creation is not done by mongosync (mongosync doesn't write to the source cluster).
I created a ticket REP-2956 in mongosync to consider any of the following changes:
Do the all of the 3 options mentioned above individually seem like they'll take care of the issue?
Also, despite this popping up because mongosync is reading from a secondary, isn't this incorrect server behavior? | ||||||||||
| Comment by Jordi Serra Torrens [ 19/Jul/23 ] | ||||||||||
|
My reading of the latest logs linked in this comment is the same huayu.ouyang@mongodb.com concluded above. It appears that mongosync targeted an aggregation to a secondary shard node that was in the process of replicating the collection creation (possibly still hadn't, but I can't really tell from the logs), therefore it did not know about the expectedUUID so it threw CollectionUUIDMismatch.
rohan.sharan@mongodb.com Can you confirm whether mongosync's SampleCollection performs secondary reads? If so, does it use causal consistent sessions? Also, did the source collection creation happen on the same client/session as mongosync is running the aggregate or a different one? Otherwise we'd need mongosync to pass a causal consistency token that is inclusive of the collection creation so that later the aggregation can be guaranteed to see the effects of the collection creation. | ||||||||||
| Comment by Huayu Ouyang [ 07/Jul/23 ] | ||||||||||
|
I have tried to reproduce this bug but have not been able to. Since this occurred only once back in April, it might be possible that changes to how AutoGetCollection and lock free reads work since then like https://github.com/10gen/mongo/commit/e7b145405e5e076ebe59354a34cb3a74a67e2c5e or https://github.com/10gen/mongo/commit/014850f5f9d0e2cd9490755e94fe7a17570f7a70 have also solved the issue. Additionally, Closing this as Cannot Reproduce, but if this reoccurs feel free to re-open or file a new ticket. | ||||||||||
| Comment by Huayu Ouyang [ 27/Jun/23 ] | ||||||||||
|
From srcCluster-rs-27020.log (which is a secondary) it seems like we successfully create the userWriteBlock coll
Immediately afterwards we get this user assertion:
This comes from here. It seems like we're uasserting because the collection we get vs the UUID that is passed in don't have the same UUID, but then when we lookup the namespace of the UUID that is passed in, we DO get a namespace that is the same as the expected namespace. On mongosync we do writes with majority write concern, but we're doing a read on the secondary right now, so it's possible the write didn't reach this secondary. However, we should be seeing a consistent view of the catalog throughout this operation, so when we lookup of the namespace of the passed in UUID, it should have not returned any namespace if the collection wasn't created yet. I looked at the CollectionCatalog::lookupNSSByUUID function and it seems like it first does UncommittedCatalogUpdates::lookupCollection so I'm wondering if this might be getting the new collection that was created? Talked to gregory.noma@mongodb.com offline and we noted that related tickets REP-832 and/or REP-2743 might solve this on the mongosync side but I will continue investigating why this is occurring on the server side. |