Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-40235

Is there a workaround to improve perfs on $lookup with $expr ?

    XMLWordPrintable

    Details

      Description

      Hello,

      I'm facing the `16Mb doc size` hard limit on an aggregation using a `$lookup`.

      I have two collections: `tag` and `video`

      A `tag` looks like this:

      {
          "_id" : ObjectId("5c81217506b84b3e13cc106c"),
          "..."
          "created_at" : ISODate("2015-03-02T09:51:26.315Z"),
          "updated_at" : ISODate("2015-03-02T09:51:26.315Z"),
      }

      A `video` looks like this:

      {
          "_id" : ObjectId("5c81223606b84b3e13cca867"),
          {...},
          "tags" : [ 
              {
                  "_id" : ObjectId("5c81217606b84b3e13cc27e5"),
                  {...}
              },
              {
                  "_id" : ObjectId("5c81217506b84b3e13cc148c"),
                  "publish_id" : "4f10596a-1a7c-4a18-b81d-0c33c4887425",
                  "name" : "jenifer the voice",
                  "label" : "Jenifer The Voice",
                  "slug" : "jenifer-the-voice"
              }, 
          ]
      }
      

      So a `video` can have multiple tags (embded documents) and a `
      tag` can be used in multiple `videos`

      In `tag` collection, I need to know how much `video` are related to a tag and (for now) I'm constrained to calculate this data everytime I do a find on my interface (yep that's not really cool).

      So I added an index on `video`collection for `tags._id` and made the following aggregation:

      db.getCollection('tag').aggregate([
        {"$match":  {"_id":  {"$in": [ ... ]}}},
        {"$lookup": { "from": "video", "localField": "_id", "foreignField": "tags._id", "as": "videoTags"}},
        {"$project": {
          "_id": 1,
          "name": "$name",
          "publish_id": "$publish_id",
          "label": "$label",
          "slug": "$slug",
          "created_at": "$created_at",
          "updated_at": "$updated_at",
          "sort": "$sort",
          "archived": "$archived",
          "parent_id": "$parent_id",
          "nbRelatedVideos": {"$size": "$videoTags"}
        }}
      ])
      

       
      That one works really great until I have too much videos related to a tag ...

      It appears that the lookup will set the entire `video` documents in a `videoTags` field ... so it can take a lot of space. As I only need to count the number of videos, I just need one field to count on (like `_id`).

      Maybe there is a way I can count that directly in the `$lookup` ?

      After some researches, I get to find a solution not to encounter the 16mb hard limit ... but it is clearly not fast as `$expr` does not use multiKey indexes:

       

      db.getCollection('tag').aggregate([
        {$match:  {"_id":  {$in: [ ... ]}}},
        {$lookup: {
            from: "video",
            as: "videoTags",
            let: { "tagId": "$_id" },
            pipeline: [
              {$unwind: "$tags"},
              {$match: {
                  $expr: {
                      $eq: ["$tags._id", "$$tagId"]
                  }
              }},
              {$project: {"_id": true}}
            ],
      //      "localField": "_id",
      //      "foreignField": "tags._id", 
        }},
        {"$project": {
          "_id": 1,
          "name": "$name",
          "publish_id": "$publish_id",
          "label": "$label",
          "slug": "$slug",
          "created_at": "$created_at",
          "updated_at": "$updated_at",
          "sort": "$sort",
          "archived": "$archived",
          "parent_id": "$parent_id",
          "nbRelatedVideos": {"$size": "$videoTags"}
        }},
      ])
      

       

      Is there any workaround to improve perfs here ?
      Maybe there is a way I can do the count directly in the `$lookup` ?

      I hope the question is clear enough, do not hesitate to ask for more information if needed.

        Attachments

          Activity

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: