|
Thanks for that Mark, I guess it is a question about performance really and more specifically around handling bucketed data. In our application, the first $match is needed to find the relevant buckets which, after some projection (including a $filter), a $unwind is used to split the remaining array of embedded documents into distinct documents that are then filtered again using $match. Depending on the parameters this could very well result in a secondary $match against many thousands of "stage" documents (output from the $unwind) and so I had concerns over the performance of this and ways to optimise it.
|
|
Hello aharris,
Thank you for the report. You are correct that two $match stages with a different stage between them would only use the index on the first $match. However a $match followed by a $match would use the index for both stages. The reason for this behavior is that there is no way to know the structure of data ahead of time as other stages in the pipeline could modify it, so there would be no way to keep an index of that data. Additionally, the collection scan you see is not performed on the collection itself, but on the output of the previous pipeline stage, only the initial match is against the collection itself, which should use an index scan if one is available.
Thanks,
Mark
|