Loading...

Type: Improvement
Resolution: Works as Designed
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 4.2.6, 4.2.8, 4.4.0-rc12
Component/s: Aggregation Framework
Labels:
None

Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

When doing an aggregation from a somehow larger collection (260k documents in my case), having one or more lookup stage(s) towards small collections (94 and 207 documents in my case), the performance further in the pipeline is slow (doing a sort on an embedded key from the lookup, or anything else (big skip, rare match,...) which forces a full pass to happen).

In my example (see below for exact pipeline and data), we start from 260k documents, lookup in two very small other collections (on an indexed field), project to collapse a embedded key from the lookup tables and then sort on this projected field. This runs for 53 seconds.

After analysing this for a while, and looking at mongotop, i notices that most time is spent reading from the lookup-tables. However, even when reducing the dataset to a couple of megabytes and making sure the WT cache is certainly big enough (1.5G) to hold everything (data, indexes and intermediate results), I cannot get my head around it why this is this slow ... I understand it has to do 260k lookups, twice (so, 500k-ish) due to the double lookup, but these happen on an indexed field in a dataset of minimal size.

I timed doing 1 million simple finds on the indexed field in the larger of the two joined collections, and this takes about 8 seconds (which is double the amount of what should be done).

I realise that doing lookups is expensive, but it feels like there is something going wrong in the lookup phase - this should be an all in-memory operation.

The indices on the lookup tables are used in the aggregation pipeline, as without them, it takes +140 seconds. The "delete" match at the beginning doesn't reduce the dataset a lot (approx 1%), leaving that out makes little difference.

I tried on 4.2.6, 4.2.8 and the 4.4.0-rc.12, all results are equal, give or take a few tenths of a second. All done in docker for mac (recent macbook pro), 2GB RAM, 1.5GB assigned as WT cache, nothing else running. We see this as well on 64GB Google-Cloud SSD-backed nodes, and equal Aliyun nodes, WT cache set to 70%).

Datamodel examples:

Alert:

{
    "_id" : ObjectId("5e154fc78dde24bb1eb2277d"),
    "item" : {
        "model" : {
            "urn" : "alertmodel:5e15375d0bb3bf0001c8454a:v3"
        }
    },
    "deleted" : false
}

Index on `deleted`, `item.model.urn`.

AlertModel:

{
    "_id" : ObjectId("5e1534178dde24bb1eac3bc4"),
    "item" : {
        "name" : {
            "text" : "Sample data"
        }
    },
    "urn" : "alertmodel:5e153417222cc60001b4eb3c:v18"
}

index on `urn`

AlertModelVersions:

{
    "_id" : ObjectId("5e1540578dde24bb1eacc0ed"),
    "item" : {
        "name" : {
            "text" : "Sample data"
        }
    },
    "urn" : "alertmodel:5e15375d0bb3bf0001c8454a:v1"
}

index on `urn`.

Aggregation pipeline:

db.Alert.aggregate([db.Alert.aggregate([ 
{ "$match" : { "deleted" : false } }, 
{ "$lookup" : { 
  "localField" : "item.model.urn", 
  "as" : "joined.item.model", 
  "from" : "AlertModel", 
  "foreignField" : "urn" } }, 
{ "$lookup" : { 
  "localField" : "item.model.urn", 
  "as" : "versionJoined.item.model", 
  "from" : "AlertModelVersions", 
  "foreignField" : "urn" } }, 
{ "$project" : { 
  "joined.item.model" : { 
    "$arrayElemAt" : [ { "$map" : { "input" : { "$concatArrays" : [ "$joined.item.model", "$versionJoined.item.model" ] }, "as" : "object", "in" : "$$object" } },
    0 ] }, 
} },      
{ "$sort" : { "joined.item.model.item.name.text" : -1 } },   
{ "$limit" : 50 }, 
], 
{"allowDiskUse": true})

Explain:

{
    "stages" : [ 
        {
            "$cursor" : {
                "queryPlanner" : {
                    "plannerVersion" : 1,
                    "namespace" : "atl.Alert",
                    "indexFilterSet" : false,
                    "parsedQuery" : {
                        "deleted" : {
                            "$eq" : false
                        }
                    },
                    "queryHash" : "3BDEB6AE",
                    "planCacheKey" : "71819335",
                    "winningPlan" : {
                        "stage" : "PROJECTION_DEFAULT",
                        "transformBy" : {
                            "_id" : 1,
                            "item.model.urn" : 1,
                            "joined" : 1,
                            "versionJoined.item.model" : 1
                        },
                        "inputStage" : {
                            "stage" : "FETCH",
                            "inputStage" : {
                                "stage" : "IXSCAN",
                                "keyPattern" : {
                                    "deleted" : 1.0,
                                    "item.model.urn" : 1.0
                                },
                                "indexName" : "deleted_1_item.model.urn_1",
                                "isMultiKey" : false,
                                "multiKeyPaths" : {
                                    "deleted" : [],
                                    "item.model.urn" : []
                                },
                                "isUnique" : false,
                                "isSparse" : false,
                                "isPartial" : false,
                                "indexVersion" : 2,
                                "direction" : "forward",
                                "indexBounds" : {
                                    "deleted" : [ 
                                        "[false, false]"
                                    ],
                                    "item.model.urn" : [ 
                                        "[MinKey, MaxKey]"
                                    ]
                                }
                            }
                        }
                    },
                    "rejectedPlans" : []
                }
            }
        }, 
        {
            "$lookup" : {
                "from" : "AlertModel",
                "as" : "joined.item.model",
                "localField" : "item.model.urn",
                "foreignField" : "urn"
            }
        }, 
        {
            "$lookup" : {
                "from" : "AlertModelVersions",
                "as" : "versionJoined.item.model",
                "localField" : "item.model.urn",
                "foreignField" : "urn"
            }
        }, 
        {
            "$project" : {
                "_id" : true,
                "joined" : {
                    "item" : {
                        "model" : {
                            "$arrayElemAt" : [ 
                                {
                                    "$map" : {
                                        "input" : {
                                            "$concatArrays" : [ 
                                                "$joined.item.model", 
                                                "$versionJoined.item.model"
                                            ]
                                        },
                                        "as" : "object",
                                        "in" : "$$object"
                                    }
                                }, 
                                {
                                    "$const" : 0.0
                                }
                            ]
                        }
                    }
                }
            }
        }, 
        {
            "$sort" : {
                "sortKey" : {
                    "joined.item.model.item.name.text" : -1
                },
                "limit" : NumberLong(50)
            }
        }
    ],
    "serverInfo" : {
        "host" : "9344216bd23c",
        "port" : 27017,
        "version" : "4.4.0-rc12",
        "gitVersion" : "4b96db94125b5e0ca1da0c91f09a5fca8c94d9ab"
    },
    "ok" : 1.0
}

I can give a complete dataset if you want (1.7 MB compressed mongodump). Let me know if you need more info.

(I've left the priority field below at the default level)

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

dataset.tgz
1.60 MB
Jul 17 2020 08:51:08 PM UTC
diagnostic.data.tgz
55 kB
Jul 17 2020 08:51:07 PM UTC
gdb_sort.txt
386 kB
Jul 23 2020 09:27:58 PM UTC
gdb.html
41 kB
Jul 23 2020 09:28:06 PM UTC
tree.png
326 kB
Jul 24 2020 12:43:09 PM UTC

related to

SERVER-21284 $lookup should cache query results

Backlog

SERVER-21312 $lookup should batch query requests

Backlog

Details

Description

Attachments

Attachments

Issue Links

Forms

Activity

People

Dates