Details
-
Bug
-
Status: Closed
-
Major - P3
-
Resolution: Fixed
-
5.0.8, 5.0.6
-
Fully Compatible
-
ALL
-
v6.0, v5.3, v5.0
-
QE 2022-06-13, QO 2022-07-11, QO 2022-07-25
Description
I'm using mongodb aggregation pipeline with $sampleRate in order to improve my query performances. I felt on a strange behavior i don't understand ...
Here is my aggregation pipeline running on a big collection (1M+ documents) :
[
|
{
|
'$match': { |
publishedAt: {
|
'$gt': new Date('2021-04-27T22:00:00.000Z'), |
'$lt': new Date('2022-04-28T21:59:59.999Z') |
},
|
//... some other matching fields |
}
|
},
|
{
|
'$group': { |
_id: {
|
keyWords: '$keyWords', // This is an Array<String> |
//... some other fields |
},
|
first: { '$first': '$$CURRENT' } |
}
|
},
|
{ '$match': { '$sampleRate': 0.25 } }, // This is where i do my sampling |
{ '$replaceRoot': { newRoot: '$first' } }, |
{
|
'$project': { |
_id: true, |
//... some other fields |
}
|
}
|
]
|
When i do this i get approximately two times more documents than when i inverse the $replaceRoot and $sampleRate steps =>
[
|
{
|
'$match': { |
publishedAt: {
|
'$gt': new Date('2021-04-27T22:00:00.000Z'), |
'$lt': new Date('2022-04-28T21:59:59.999Z') |
},
|
//... some other matching fields |
}
|
},
|
{
|
'$group': { |
_id: {
|
keyWords: '$keyWords', // This is an Array<String> |
//... some other fields |
},
|
first: { '$first': '$$CURRENT' } |
}
|
},
|
{ '$replaceRoot': { newRoot: '$first' } }, |
{ '$match': { '$sampleRate': 0.25 } }, // This is where i do my sampling |
{
|
'$project': { |
_id: true, |
//... some other fields |
}
|
}
|
]
|
... I don't understand why oO They should give the same number of documents to me.
Do you know where i'm failing to understand ? Or is it a bug ?
PS : I created a question here : https://stackoverflow.com/questions/72048023/mongodb-aggregate-pipeline-sampling-fail
Attachments
Issue Links
- is caused by
-
SERVER-39938 aggregation $match before $lookup optimization doesn't happen when $expr: $eq is used
-
- Closed
-