Details
-
Bug
-
Status: Closed
-
Major - P3
-
Resolution: Fixed
-
5.0.8, 5.0.6
-
Fully Compatible
-
ALL
-
v6.0, v5.3, v5.0
-
QE 2022-06-13, QO 2022-07-11, QO 2022-07-25
Description
I'm using mongodb aggregation pipeline with $sampleRate in order to improve my query performances. I felt on a strange behavior i don't understand ...
Here is my aggregation pipeline running on a big collection (1M+ documents) :
Â
 [
|
   {
|
    '$match': { |
     publishedAt: {
|
      '$gt': new Date('2021-04-27T22:00:00.000Z'), |
      '$lt': new Date('2022-04-28T21:59:59.999Z') |
     },
|
     //... some other matching fields |
    }
|
   },
|
   {
|
    '$group': { |
     _id: {
|
      keyWords: '$keyWords', // This is an Array<String> |
      //... some other fields |
     },
|
     first: { '$first': '$$CURRENT' } |
    }
|
   },
|
   { '$match': { '$sampleRate': 0.25 } }, // This is where i do my sampling |
   { '$replaceRoot': { newRoot: '$first' } }, |
   {
|
    '$project': { |
     _id: true, |
     //... some other fields |
    }
|
   }
|
  ]
|
When i do this i get approximately two times more documents than when i inverse the $replaceRoot and $sampleRate steps =>
 Â
 [
|
   {
|
    '$match': { |
     publishedAt: {
|
      '$gt': new Date('2021-04-27T22:00:00.000Z'), |
      '$lt': new Date('2022-04-28T21:59:59.999Z') |
     },
|
     //... some other matching fields |
    }
|
   },
|
   {
|
    '$group': { |
     _id: {
|
      keyWords: '$keyWords', // This is an Array<String> |
      //... some other fields |
     },
|
     first: { '$first': '$$CURRENT' } |
    }
|
   },
|
   { '$replaceRoot': { newRoot: '$first' } }, |
   { '$match': { '$sampleRate': 0.25 } }, // This is where i do my sampling |
   {
|
    '$project': { |
     _id: true, |
     //... some other fields |
    }
|
   }
|
  ]
|
... I don't understand why oO They should give the same number of documents to me.
Do you know where i'm failing to understand ? Or is it a bug ?
PS : I created a question here : https://stackoverflow.com/questions/72048023/mongodb-aggregate-pipeline-sampling-fail
Attachments
Issue Links
- is caused by
-
SERVER-39938 aggregation $match before $lookup optimization doesn't happen when $expr: $eq is used
-
- Closed
-