[SERVER-22622] Improve $lookup explain to indicate query plan on the "from" collection Created: 16/Feb/16  Updated: 30/Jan/24

Status: Needs Scheduling
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Sergey Kazankov Assignee: Chris Harris
Resolution: Unresolved Votes: 51
Labels: SWDI, query-product-scope-2, query-product-urgency-2, query-product-value-2
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File lookup_test.js    
Issue Links:
Depends
Related
related to COMPASS-7062 Query Performance Explain Summary for... Open
related to SERVER-26789 Logging for $lookup less complete in ... Backlog
related to SERVER-53762 Report aggregate execution stats in e... Closed
is related to SERVER-50246 $unionWith explain loses information ... Closed
Assigned Teams:
Query Execution
Participants:
Case:

 Description   

The explain output of the $lookup stage doesn't show any information about the query that would be executed on the "from" collection. This can lead to confusion about whether or not $lookup is using an index when the document source cursor on the collection that the "aggregate" command was run on is performing a collection scan.


Original description

Lookup don't use indexes. Here is the test code:

db.goods.drop();
db.persons.drop();
db.goods_persons.drop();
 
db.goods.insert({'id': 1, 'name': 'good1'});
db.goods.insert({'id': 2, 'name': 'good2'});
 
db.persons.insert({'id': 1, 'name': 'john'});
db.persons.insert({'id': 2, 'name': 'tom'});
 
db.goods_persons.insert({'good': 1, 'person': 1});
db.goods_persons.insert({'good': 2, 'person': 1});
 
db.goods_persons.insert({'good': 1, 'person': 2});
 
 
//create all probably indexes
db.goods.ensureIndex({'id':1});
db.persons.ensureIndex({'id':1});
db.goods_perons.ensureIndex({'good':1});
db.goods_perons.ensureIndex({'person':1});
db.goods.ensureIndex({'goods_persons.person':1});
db.goods.ensureIndex({'goods_persons.good':1});
 
//get buyers
db.goods.aggregate(
[
    {$lookup:{
        from: "goods_persons",
        localField: "id",
        foreignField: "good",
        as: "goods_persons"
    }},
    {$match:{
        "goods_persons.person": 1
    }}
], {'explain': true}); //COLLSCAN   -  index not used



 Comments   
Comment by Eric Sedor [ 07/Aug/23 ]

601290552@qq.com For what you are reporting we'd like to encourage you to start by asking our community for help by posting on the MongoDB Developer Community Forums.

If the discussion there leads you to suspect a bug in the MongoDB server, then we'd want to investigate it as a possible bug here in the SERVER project. Unless you find solid evidence to suggest it is the issue described by this ticket, it would be a new ticket.

Sincerely,
Eric

Comment by jing xu [ 04/Aug/23 ]

hi Christopher Harris:
my server is 6.0.5.but insidepipeline output is all 0
totalDocsExamined: Long("0"),
totalKeysExamined: Long("0"),
collectionScans: Long("0"),
indexesUsed: [],
nReturned: Long("0"),
$lookup
{
'$lookup':

{ from: 'xiaoxu', as: 'output', localField: 'waybillno', foreignField: 'waybillno' }

,
totalDocsExamined: Long("0"),
totalKeysExamined: Long("0"),
collectionScans: Long("0"),
indexesUsed: [],
nReturned: Long("0"),
executionTimeMillisEstimate: Long("0")
},
{
'$match': { output: { '$not':

{ '$eq': [] }

} },
nReturned: Long("0"),
executionTimeMillisEstimate: Long("0")
}

Comment by Viktor Molokostov [ 11/Nov/22 ]

Thanks @christopher.harris@mongodb.com , it looks like my client (IntellijIDEA) is a bit limited in this matter, I'll try to explore it more with Compass.

Comment by Chris Harris [ 09/Nov/22 ]

Beginning in version 5.0, explain output has been meaningfully enhanced when it comes to $lookup via SERVER-53762. Since then, running explain with a verbosity mode that gathers execution stats will additionally report information about collection scans, index scans, and objects examined for the $lookup portion of the output. For example:

		{
			"$lookup" : {
				"from" : "foreign",
				"as" : "output",
				"localField" : "localField",
				"foreignField" : "foreignField"
			},
			"nReturned" : NumberLong(2),
			"executionTimeMillisEstimate" : NumberLong(3),
			"totalDocsExamined" : NumberLong(2),
			"totalKeysExamined" : NumberLong(2),
			"collectionScans" : NumberLong(5),
			"indexesUsed" : ["IDX1", "IDX2"]
		}
 

This significantly improves the diagnosability and usability of the information related to $lookup directly reported by explain. We will revisit this request to further enhance this output at a later date when we make broader changes to the explain command.

Comment by Viktor Molokostov [ 08/Nov/22 ]

Are there any plans to add lookup stats into aggregation explain?

Comment by Asya Kamsky [ 11/Nov/16 ]

The same should apply to $graphLookup stage.

Comment by Max Hirschhorn [ 14/Mar/16 ]

Hi kazankov,

I've converted this ticket into an improvement request for the explain output of the $lookup stage. Thanks for bringing this issue to our attention!

Best,
Max

Comment by Max Hirschhorn [ 17/Feb/16 ]

There isn't a way to run an explain on an aggregation pipeline and see what index is being used to satisfy the $lookup stage. What I was trying to get at by saying

The $lookup stage simply executes a query of the form {<foreignField>: {$eq: <localField value>}} on the <from> collection for each source document.

is that it's possible to run a query of the aforementioned form against the <from> collection to see what query plan would be used to answer it. As I showed in my previous comment, we will indeed use the {good: 1} index on the test.goods_persons collection.

If you need more convincing, then we can also examine the plan summary output written to the logs while executing the aggregation pipeline from the ticket's description (with explain=false and setting the mongod's log level to 1).

2016-02-16T22:27:56.929-0500 I QUERY    [conn1] query test.goods_persons query: { good: { $eq: 1.0 } } planSummary: IXSCAN { good: 1.0 } ntoreturn:0 ntoskip:0 keysExamined:2 docsExamined:2 cursorExhausted:1 keyUpdates:0 writeConflicts:0 numYields:0 nreturned:2 reslen:124 locks:{ Global: { acquireCount: { r: 6 } }, Database: { acquireCount: { r: 3 } }, Collection: { acquireCount: { r: 3 } } } 0ms
2016-02-16T22:27:56.929-0500 I QUERY    [conn1] query test.goods_persons query: { good: { $eq: 2.0 } } planSummary: IXSCAN { good: 1.0 } ntoreturn:0 ntoskip:0 keysExamined:1 docsExamined:1 cursorExhausted:1 keyUpdates:0 writeConflicts:0 numYields:0 nreturned:1 reslen:72 locks:{ Global: { acquireCount: { r: 8 } }, Database: { acquireCount: { r: 4 } }, Collection: { acquireCount: { r: 4 } } } 0ms
2016-02-16T22:27:56.930-0500 I COMMAND  [conn1] command test.goods command: aggregate { aggregate: "goods", pipeline: [ { $lookup: { from: "goods_persons", localField: "id", foreignField: "good", as: "goods_persons" } }, { $match: { goods_persons.person: 1.0 } } ], explain: false, cursor: {} } keyUpdates:0 writeConflicts:0 numYields:0 reslen:412 locks:{ Global: { acquireCount: { r: 10 } }, Database: { acquireCount: { r: 5 } }, Collection: { acquireCount: { r: 5 } } } protocol:op_command 0ms

We see two queries on the test.goods_persons collection: {good: {$eq: 1.0}} and {good: {$eq: 2.0}}, both of which are using the {good: 1} index on the test.goods_persons collection. This is precisely what we should expect given my description of how the $lookup stage is implemented and the explain output I showed earlier.

The real problem for me that 'goods' collection count is 10 millions and 'persons' count is 20. I just wanted to find goods for one person but with personal information.

As you alluded to in this ticket's description, the aggregation pipeline is doing a collection scan of the test.goods collection. This means that every purchase will be examined, regardless of who made it. Since you've now said that you're only interested in the goods purchased by person #1, why not change your aggregation pipeline to run against the test.goods_persons collection and apply a filter to process only those goods purchased by person #1?

From the explain output, we can verify that the following aggregation pipeline will use the {person: 1} index on the test.goods_persons collection to examine only person #1's purchases. The information for the goods purchased will then be looked up (individually, per the $lookup stage) by doing queries of the form {id: {$eq: <good>}} on the test.goods collection. These queries will be satisfied by using the {id: 1} index on the test.goods collection.

> db.goods_persons.createIndex({person: 1});
> db.goods.createIndex({id: 1});
> db.goods_persons.aggregate([
  {$match: {person: 1}},
  {$lookup:{
      from: "goods",
      localField: "good",
      foreignField: "id",
      as: "goods_info"
  }},
], {explain: true});
{
  "waitedMS" : NumberLong(0),
  "stages" : [
    {
      "$cursor" : {
        "query" : {
          "person" : 1
        },
        "queryPlanner" : {
          "plannerVersion" : 1,
          "namespace" : "test.goods_persons",
          "indexFilterSet" : false,
          "parsedQuery" : {
            "person" : {
              "$eq" : 1
            }
          },
          "winningPlan" : {
            "stage" : "FETCH",
            "inputStage" : {
              "stage" : "IXSCAN",
              "keyPattern" : {
                "person" : 1
              },
              "indexName" : "person_1",
              "isMultiKey" : false,
              "isUnique" : false,
              "isSparse" : false,
              "isPartial" : false,
              "indexVersion" : 1,
              "direction" : "forward",
              "indexBounds" : {
                "person" : [
                  "[1.0, 1.0]"
                ]
              }
            }
          },
          "rejectedPlans" : [ ]
        }
      }
    },
    {
      "$lookup" : {
        "from" : "goods",
        "as" : "goods_info",
        "localField" : "good",
        "foreignField" : "id"
      }
    }
  ],
  "ok" : 1
}
> db.goods.explain().find({id: {$eq: 1}});
{
  "queryPlanner" : {
    "plannerVersion" : 1,
    "namespace" : "test.goods",
    "indexFilterSet" : false,
    "parsedQuery" : {
      "id" : {
        "$eq" : 1
      }
    },
    "winningPlan" : {
      "stage" : "FETCH",
      "inputStage" : {
        "stage" : "IXSCAN",
        "keyPattern" : {
          "id" : 1
        },
        "indexName" : "id_1",
        "isMultiKey" : false,
        "isUnique" : false,
        "isSparse" : false,
        "isPartial" : false,
        "indexVersion" : 1,
        "direction" : "forward",
        "indexBounds" : {
          "id" : [
            "[1.0, 1.0]"
          ]
        }
      }
    },
    "rejectedPlans" : [ ]
  },
  "serverInfo" : {
    "host" : "gopher-blue",
    "port" : 27017,
    "version" : "3.2.1",
    "gitVersion" : "a14d55980c2cdc565d4704a7e3ad37e4e535c1b2"
  },
  "ok" : 1
}

Comment by Sergey Kazankov [ 16/Feb/16 ]

There is no lookup query in your example.

My code is just for test. The real problem for me that 'goods' collection count is 10 millions and 'persons' count is 20. I just wanted to find goods for one person but with personal information. I used lookup + match like I used join in sql. And in mongo this query took a very-very long time. And result count was about 10 goods. Thats why I desided that no index was used.

Comment by Max Hirschhorn [ 16/Feb/16 ]

The COLLSCAN in the explain output is referring to the scan of the test.goods collection (i.e. the collection the "aggregate" command is being run on) and doesn't mean an index isn't being used to answer the $lookup. The $lookup stage simply executes a query of the form {<foreignField>: {$eq: <localField value>}} on the <from> collection for each source document. Running the "explain" command on the test.goods_persons collection after having created an index {good: 1}, we can see that the $lookup stage will indeed use that index.

> db.goods_persons.createIndex({good: 1});
> db.goods_persons.explain().find({good: {$eq: 1}});
{
  "queryPlanner" : {
    "plannerVersion" : 1,
    "namespace" : "test.goods_persons",
    "indexFilterSet" : false,
    "parsedQuery" : {
      "good" : {
        "$eq" : 1
      }
    },
    "winningPlan" : {
      "stage" : "FETCH",
      "inputStage" : {
        "stage" : "IXSCAN",
        "keyPattern" : {
          "good" : 1
        },
        "indexName" : "good_1",
        "isMultiKey" : false,
        "isUnique" : false,
        "isSparse" : false,
        "isPartial" : false,
        "indexVersion" : 1,
        "direction" : "forward",
        "indexBounds" : {
          "good" : [
            "[1.0, 1.0]"
          ]
        }
      }
    },
    "rejectedPlans" : [ ]
  },
  "serverInfo" : {
    "host" : "gopher-blue",
    "port" : 27017,
    "version" : "3.2.1",
    "gitVersion" : "a14d55980c2cdc565d4704a7e3ad37e4e535c1b2"
  },
  "ok" : 1
}

Generated at Thu Feb 08 04:00:57 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.