[SERVER-39166] $graphLookup should force a pipeline to split in sharded cluster Created: 23/Jan/19  Updated: 29/Oct/23  Resolved: 29/Jan/19

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: 3.4.0, 3.6.0, 4.0.0
Fix Version/s: 3.4.20, 3.6.11, 4.0.7, 4.1.8

Type: Bug Priority: Blocker - P1
Reporter: Charlie Swanson Assignee: Martin Neupauer
Resolution: Fixed Votes: 0
Labels: SWNA
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.0, v3.6, v3.4
Sprint: Query 2019-02-11
Participants:

 Description   

The $graphLookup stage does not inherit from NeedsMergerDocumentSource, which means that it is eligible to execute in parallel on all shards. This doesn't work because it will only ever do the searching locally. Instead, it should force itself to split the pipeline and execute in the merging half of the pipeline so that it can always run on the primary shard for the database.

The $graphLookup StageConstraints are correct in saying that it must run on the primary shard, but this would only apply if some other stage before $graphLookup forced the pipeline to split.

For example, see the following running against a mongos:

mongos> db.adminCommand({enableSharding: "test"})
...
mongos> db.adminCommand({shardCollection: "test.foo", key: {_id: "hashed"}})
...
mongos> db.foo.insert([{}, {}, {}, {}])
...
mongos> db.bar.insert({_id: 1, x: 1})
WriteResult({ "nInserted" : 1 })
mongos> db.foo.aggregate([{$project: {l_x: {$literal: 1}}}, {$graphLookup: {from: "bar", startWith: "$l_x", connectFromField: "x", connectToField: "_id", as: "res"}}])
{ "_id" : ObjectId("5c48f159a3c842122a8dec28"), "l_x" : 1, "res" : [ ] }  // Uh oh, these should have the same results as below.
{ "_id" : ObjectId("5c48f159a3c842122a8dec29"), "l_x" : 1, "res" : [ ] }
{ "_id" : ObjectId("5c48f159a3c842122a8dec26"), "l_x" : 1, "res" : [ { "_id" : 1, "x" : 1 } ] }
{ "_id" : ObjectId("5c48f159a3c842122a8dec27"), "l_x" : 1, "res" : [ { "_id" : 1, "x" : 1 } ] }



 Comments   
Comment by Githook User [ 19/Feb/19 ]

Author:

{'name': 'Martin Neupauer', 'email': 'martin.neupauer@mongodb.com', 'username': 'MartinNeupauer'}

Message: SERVER-39166 $graphLookup should force a pipeline to split in sharded cluster

(cherry picked from commit b8231a4b5a25a957219c9c2e6b51f93c674e0b37)
Branch: v4.0
https://github.com/mongodb/mongo/commit/93da6c90a40bd1793503552e592e9dcf25a73a67

Comment by Githook User [ 14/Feb/19 ]

Author:

{'name': 'Martin Neupauer', 'email': 'martin.neupauer@mongodb.com', 'username': 'MartinNeupauer'}

Message: SERVER-39166 $graphLookup should force a pipeline to split in sharded cluster
Branch: v3.6
https://github.com/mongodb/mongo/commit/1a7440011639302ca72df4d7c9395fceeda042bd

Comment by Githook User [ 13/Feb/19 ]

Author:

{'name': 'Martin Neupauer', 'email': 'martin.neupauer@mongodb.com', 'username': 'MartinNeupauer'}

Message: SERVER-39166 $graphLookup should force a pipeline to split in sharded cluster
Branch: v3.4
https://github.com/mongodb/mongo/commit/55583d2bfe6fa67223751724ae08c5688f46c04c

Comment by Githook User [ 30/Jan/19 ]

Author:

{'email': 'martin.neupauer@mongodb.com', 'name': 'Martin Neupauer'}

Message: SERVER-39166 blacklist graph_lookup.js on the mixed shards
Branch: master
https://github.com/mongodb/mongo/commit/0d55de74bbf0dca74d1c4f08b61404c5606fa67e

Comment by Githook User [ 29/Jan/19 ]

Author:

{'username': 'MartinNeupauer', 'email': 'martin.neupauer@mongodb.com', 'name': 'Martin Neupauer'}

Message: SERVER-39166 $graphLookup should force a pipeline to split in sharded cluster
Branch: master
https://github.com/mongodb/mongo/commit/b8231a4b5a25a957219c9c2e6b51f93c674e0b37

Comment by Charlie Swanson [ 23/Jan/19 ]

martin.neupauer has a patch that will fix this on master as part of SERVER-32666. I think we should consider splitting out the bug fix part of that patch and use it for backport. This is assuming that the patch would cherry-pick at all - we changed some of the pipeline splitting 'mergingLogic' stuff on master.

Generated at Thu Feb 08 04:51:13 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.