-
Type:
Improvement
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
Query Optimization
-
None
-
None
-
None
-
None
-
None
-
None
-
None
When running an aggregation pipeline where the implicit result is a distinct set filtering documents using the $in operator reverts to IXSCAN instead of DISTINCT_SCAN
Example:
const statuses = ["PENDING", "COMPLETE", "SHIPPED", "CANCELLED", "RETURNED"]; const bulk = []; for (let i = 1; i <= 1000; i++) { for (let r = 0; r < 10; r++) { bulk.push({ customer_id: i, status: statuses[Math.floor(Math.random() * statuses.length)] }); } } db.col1.insertMany(bulk); db.col1.createIndex({customer_id:1,status:1})
Running the following:
// Used DISTINCT_SCAN db.col1.explain("executionStats").distinct( "status", { customer_id: { $in: [1, 2, 3] } } ) // Used DISTINCT_SCAN db.col1.explain("executionStats").distinct( "status", { customer_id: { $in: [1, 500, 750] } } ) // Used DISTINCT_SCAN db.col1.aggregate([ { $match: { customer_id: { $in: [1] } } }, { $group: { _id: "$status" } } ]).explain("executionStats") // single value on $in // Used IXSCAN db.col1.aggregate([ { $match: { customer_id: { $in: [1, 2, 3] } } }, { $group: { _id: "$status" } } ]).explain("executionStats") // multiple values on $in
- The distinct method is picking the correct plan for this case.
- Aggregation rewritten into a distinct query where the $in operator is the same as $eq ($in: [1]= $eq: 1) is using DISTINCT_SCAN.
- Aggregation rewritten into a distinct query where the $in operator pass multiple values revert to IXSCAN.