Details
-
Question
-
Resolution: Done
-
Major - P3
-
None
-
4.0.5
-
None
Description
Hello,
I'm facing the `16Mb doc size` hard limit on an aggregation using a `$lookup`.
I have two collections: `tag` and `video`
A `tag` looks like this:
{
|
"_id" : ObjectId("5c81217506b84b3e13cc106c"), |
"..." |
"created_at" : ISODate("2015-03-02T09:51:26.315Z"), |
"updated_at" : ISODate("2015-03-02T09:51:26.315Z"), |
}
|
A `video` looks like this:
{
|
"_id" : ObjectId("5c81223606b84b3e13cca867"), |
{...},
|
"tags" : [ |
{
|
"_id" : ObjectId("5c81217606b84b3e13cc27e5"), |
{...}
|
},
|
{
|
"_id" : ObjectId("5c81217506b84b3e13cc148c"), |
"publish_id" : "4f10596a-1a7c-4a18-b81d-0c33c4887425", |
"name" : "jenifer the voice", |
"label" : "Jenifer The Voice", |
"slug" : "jenifer-the-voice" |
},
|
]
|
}
|
So a `video` can have multiple tags (embded documents) and a `
tag` can be used in multiple `videos`
In `tag` collection, I need to know how much `video` are related to a tag and (for now) I'm constrained to calculate this data everytime I do a find on my interface (yep that's not really cool).
So I added an index on `video`collection for `tags._id` and made the following aggregation:
db.getCollection('tag').aggregate([ |
{"$match": {"_id": {"$in": [ ... ]}}}, |
{"$lookup": { "from": "video", "localField": "_id", "foreignField": "tags._id", "as": "videoTags"}}, |
{"$project": { |
"_id": 1, |
"name": "$name", |
"publish_id": "$publish_id", |
"label": "$label", |
"slug": "$slug", |
"created_at": "$created_at", |
"updated_at": "$updated_at", |
"sort": "$sort", |
"archived": "$archived", |
"parent_id": "$parent_id", |
"nbRelatedVideos": {"$size": "$videoTags"} |
}}
|
])
|
That one works really great until I have too much videos related to a tag ...
It appears that the lookup will set the entire `video` documents in a `videoTags` field ... so it can take a lot of space. As I only need to count the number of videos, I just need one field to count on (like `_id`).
Maybe there is a way I can count that directly in the `$lookup` ?
After some researches, I get to find a solution not to encounter the 16mb hard limit ... but it is clearly not fast as `$expr` does not use multiKey indexes:
db.getCollection('tag').aggregate([ |
{$match: {"_id": {$in: [ ... ]}}}, |
{$lookup: {
|
from: "video", |
as: "videoTags", |
let: { "tagId": "$_id" }, |
pipeline: [
|
{$unwind: "$tags"}, |
{$match: {
|
$expr: {
|
$eq: ["$tags._id", "$$tagId"] |
}
|
}},
|
{$project: {"_id": true}} |
],
|
// "localField": "_id",
|
// "foreignField": "tags._id",
|
}},
|
{"$project": { |
"_id": 1, |
"name": "$name", |
"publish_id": "$publish_id", |
"label": "$label", |
"slug": "$slug", |
"created_at": "$created_at", |
"updated_at": "$updated_at", |
"sort": "$sort", |
"archived": "$archived", |
"parent_id": "$parent_id", |
"nbRelatedVideos": {"$size": "$videoTags"} |
}},
|
])
|
Is there any workaround to improve perfs here ?
Maybe there is a way I can do the count directly in the `$lookup` ?
I hope the question is clear enough, do not hesitate to ask for more information if needed.