ISSUE DESCRIPTION AND IMPACT
When running a $lookup, $graphLookup or $unionWith followed by a $merge or $out pipeline where all of the following conditions are true:
$merge or $out is writing to a different database the name of the collections is the same for both the foreign collection in the $lookup, $graphLookup, or $unionWith and the target collection for the $merge or $out the $lookup, $graphLookup, or $unionWith will non-deterministically incorrectly read from the collection in the database which is being written to. This will result in missing data or incorrect data written in the output collection of $merge or $out pipeline.
DIAGNOSIS AND AFFECTED VERSIONS
MongoDB 6.0, 7.0 ~ 7.0.21, 8.0 ~ 8.0.9
MongoDB 7.0.22, 8.0.10, and all later versions contain fixes for these issues.
REMEDIATION AND WORKAROUNDS
Customers are recommended to upgrade to the latest version that contains the fix. For MongoDB Atlas Customers on 7.0 and 8.0, your Atlas clusters have already been automatically upgraded to the version containing the fix.
Workaround exists without upgrading the cluster:
To immediately avoid this issue, customers can change the name of the output collection for $out/$merge.
For $out pipelines, customers can change the name of the output collection and run the pipeline again to persist the correct results since $out atomically drops the existing collection (if one exists) and creates a new collection.
-------------------------------------------------------
Original description
When running a $lookup $unwind $merge pipeline, where $merge is writing to a different database and the $lookup foreign collection and $merge output collection have the same name, the $lookup will sometimes read from the collection in the database we are merging into.
Shell repro (based on naafiyan.ahmed@mongodb.com's repro from HELP-65740):
1) Create the merge collection in mergeDB
use mergeDB db.bar.insertMany([{"name": "b"}, {"name": "c"}])
2) Create the local and foreign collection in lookupDB and run the aggregation
use lookupDB db.foo.insertMany([{"_id": "a"}, {"_id": "b"}, {"_id": "c"}]) db.bar.insert({"name": "a"}) db.foo.explain("executionStats").aggregate([{"$lookup": {"from": "bar", "as": "same_name_docs", "localField": "_id", "foreignField": "name"}}, {"$unwind": {"path": "$same_name_docs"}}, {"$merge": {"into": {"db": "mergeDB", "coll": "bar"}, "on": "_id", "whenMatched": "replace", "whenNotMatched": "insert"}}])
nReturned for the $lookup stage will usually be 1, but will sometimes be 2 since it read from mergeDB.bar instead of lookupDB.bar
If you replace the $merge with a $out into a foreign db collection with the same name, the bug will also reproduce.
- depends on
-
SERVER-96197 ExpressionContext's _resolvedNamespaces can't distinguish between collections with the same name in different dbs
-
- Closed
-
- is related to
-
SERVER-96197 ExpressionContext's _resolvedNamespaces can't distinguish between collections with the same name in different dbs
-
- Closed
-