Loading...

XML

Word

Printable

JSON

Type: Question
Resolution: Done
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 4.0.5
Component/s: Aggregation Framework, Index Maintenance, Performance
Labels:
None

Confidence Status:
None
Work Order:
3

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Hello,

I'm facing the `16Mb doc size` hard limit on an aggregation using a `$lookup`.

I have two collections: `tag` and `video`

A `tag` looks like this:

{
    "_id" : ObjectId("5c81217506b84b3e13cc106c"),
    "..."
    "created_at" : ISODate("2015-03-02T09:51:26.315Z"),
    "updated_at" : ISODate("2015-03-02T09:51:26.315Z"),
}

A `video` looks like this:

{
    "_id" : ObjectId("5c81223606b84b3e13cca867"),
    {...},
    "tags" : [ 
        {
            "_id" : ObjectId("5c81217606b84b3e13cc27e5"),
            {...}
        },
        {
            "_id" : ObjectId("5c81217506b84b3e13cc148c"),
            "publish_id" : "4f10596a-1a7c-4a18-b81d-0c33c4887425",
            "name" : "jenifer the voice",
            "label" : "Jenifer The Voice",
            "slug" : "jenifer-the-voice"
        }, 
    ]
}

So a `video` can have multiple tags (embded documents) and a `
tag` can be used in multiple `videos`

In `tag` collection, I need to know how much `video` are related to a tag and (for now) I'm constrained to calculate this data everytime I do a find on my interface (yep that's not really cool).

So I added an index on `video`collection for `tags._id` and made the following aggregation:

db.getCollection('tag').aggregate([
  {"$match":  {"_id":  {"$in": [ ... ]}}},
  {"$lookup": { "from": "video", "localField": "_id", "foreignField": "tags._id", "as": "videoTags"}},
  {"$project": {
    "_id": 1,
    "name": "$name",
    "publish_id": "$publish_id",
    "label": "$label",
    "slug": "$slug",
    "created_at": "$created_at",
    "updated_at": "$updated_at",
    "sort": "$sort",
    "archived": "$archived",
    "parent_id": "$parent_id",
    "nbRelatedVideos": {"$size": "$videoTags"}
  }}
])

That one works really great until I have too much videos related to a tag ...

It appears that the lookup will set the entire `video` documents in a `videoTags` field ... so it can take a lot of space. As I only need to count the number of videos, I just need one field to count on (like `_id`).

Maybe there is a way I can count that directly in the `$lookup` ?

After some researches, I get to find a solution not to encounter the 16mb hard limit ... but it is clearly not fast as `$expr` does not use multiKey indexes:

db.getCollection('tag').aggregate([
  {$match:  {"_id":  {$in: [ ... ]}}},
  {$lookup: {
      from: "video",
      as: "videoTags",
      let: { "tagId": "$_id" },
      pipeline: [
        {$unwind: "$tags"},
        {$match: {
            $expr: {
                $eq: ["$tags._id", "$$tagId"]
            }
        }},
        {$project: {"_id": true}}
      ],
//      "localField": "_id",
//      "foreignField": "tags._id", 
  }},
  {"$project": {
    "_id": 1,
    "name": "$name",
    "publish_id": "$publish_id",
    "label": "$label",
    "slug": "$slug",
    "created_at": "$created_at",
    "updated_at": "$updated_at",
    "sort": "$sort",
    "archived": "$archived",
    "parent_id": "$parent_id",
    "nbRelatedVideos": {"$size": "$videoTags"}
  }},
])

Is there any workaround to improve perfs here ?
Maybe there is a way I can do the count directly in the `$lookup` ?

I hope the question is clear enough, do not hesitate to ask for more information if needed.

Assignee:: Danny Hatcher (Inactive)
Reporter:: Romain Masclef
Participants:: Danny Hatcher, Romain Masclef
Votes:: 0 Vote for this issue
Watchers:: 5 Start watching this issue

Created:: Mar 20 2019 01:30:34 PM UTC
Updated:: Mar 26 2019 04:34:04 PM UTC
Resolved:: Mar 26 2019 03:48:06 PM UTC

Details

Description

Attachments

Activity

People

Dates