-
Type:
Bug
-
Resolution: Fixed
-
Priority:
Major - P3
-
Affects Version/s: 5.0.8, 5.0.6
-
Component/s: Query Execution, Query Planning
-
Fully Compatible
-
ALL
-
v6.0, v5.3, v5.0
-
QE 2022-06-13, QO 2022-07-11, QO 2022-07-25
-
None
-
None
-
None
-
None
-
None
-
None
-
None
I'm using mongodb aggregation pipeline with $sampleRate in order to improve my query performances. I felt on a strange behavior i don't understand ...
Here is my aggregation pipeline running on a big collection (1M+ documents) :
[
{
'$match': {
publishedAt: {
'$gt': new Date('2021-04-27T22:00:00.000Z'),
'$lt': new Date('2022-04-28T21:59:59.999Z')
},
//... some other matching fields
}
},
{
'$group': {
_id: {
keyWords: '$keyWords', // This is an Array<String>
//... some other fields
},
first: { '$first': '$$CURRENT' }
}
},
{ '$match': { '$sampleRate': 0.25 } }, // This is where i do my sampling
{ '$replaceRoot': { newRoot: '$first' } },
{
'$project': {
_id: true,
//... some other fields
}
}
]
When i do this i get approximately two times more documents than when i inverse the $replaceRoot and $sampleRate steps =>
[
{
'$match': {
publishedAt: {
'$gt': new Date('2021-04-27T22:00:00.000Z'),
'$lt': new Date('2022-04-28T21:59:59.999Z')
},
//... some other matching fields
}
},
{
'$group': {
_id: {
keyWords: '$keyWords', // This is an Array<String>
//... some other fields
},
first: { '$first': '$$CURRENT' }
}
},
{ '$replaceRoot': { newRoot: '$first' } },
{ '$match': { '$sampleRate': 0.25 } }, // This is where i do my sampling
{
'$project': {
_id: true,
//... some other fields
}
}
]
... I don't understand why oO They should give the same number of documents to me.
Do you know where i'm failing to understand ? Or is it a bug ? ![]()
PS : I created a question here : https://stackoverflow.com/questions/72048023/mongodb-aggregate-pipeline-sampling-fail
- is caused by
-
SERVER-39938 aggregation $match before $lookup optimization doesn't happen when $expr: $eq is used
-
- Closed
-