- 
    Type:Bug 
- 
    Resolution: Unresolved
- 
    Priority:Minor - P4 
- 
    None
- 
    Affects Version/s: None
- 
    Component/s: Distributed Query Execution
- 
    None
- 
        Query Execution
- 
        ALL
- 
        
- 
        None
- 
        None
- 
        None
- 
        None
- 
        None
- 
        None
- 
        None
When explaining a $lookup query (and the $lookup is not pushed down to SBE) in "executionStats" verbosity mode or higher, we report several runtime statistics pertaining to the execution on the inner side:
- totalDocsExamined
- totalKeysExamined
- collectionScans
- indexesUsed
The inner collection may be sharded, and in the general case this means that we may have to target multiple shards. When this happens, the system is unable to gather runtime statistics on the inner side. This is because we currently lack a mechanism for transmitting statistics (alongside the intermediate result set) between shards when running a sharded explain. When the system is unable to gather runtime statistics for a sharded inner collection, we end up with something like the following:
{
    "$lookup" : {
                "from" : "inner",
                "as" : "joined",
                "localField" : "x",
                "foreignField" : "x"
        },
        "nReturned" : NumberLong(2),
        "executionTimeMillisEstimate" : NumberLong(23),
        "totalDocsExamined" : NumberLong(0),
        "totalKeysExamined" : NumberLong(0),
        "collectionScans" : NumberLong(0),
        "indexesUsed" : [ ]
}
Taken at face value, this makes it seem like the $lookup returned 2 documents in 23 milliseconds, but examined no keys or documents on the inner side and performed neither any collection scans or index scans. In reality, 2 docs and 2 keys were examined on the inner side, and 2 index scans were performed against the {key: 1} index. Please see the "Steps to Reproduce" for a detailed script that produces the erroneous output. What makes this even more confusing is that in my example, all of the data for both the inner and outer collections happens to reside on the primary shard – but the reporting issue still arises because the routing table for the inner collection is such that the $lookup needs to target both collections.
Ideally we would improve the system so that these statistics can be accurately reported in a sharded cluster. This is likely easier said than done because it requires communicating statistics and intermediate results alongside each other across the wire. A stopgap solution would be to change the output to somehow make it clear that statistics are known to be missing. Perhaps we could count the number of times that we had to target multiple shards on the inner side, with a note that when this happens the runtime stats can be under-reported?
- is related to
- 
                    SERVER-79702 $lookup subpipeline exectionStats output all 0 -         
- Closed
 
-