[SERVER-27494] Avoid unwind on multi-key index field for covered index aggregation Created: 22/Dec/16  Updated: 08/Feb/23

Status: Backlog
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: 3.4.0
Fix Version/s: None

Type: Improvement Priority: Minor - P4
Reporter: 아나 하리 Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 3
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-54427 DISTINCT_SCAN can be used with compou... Backlog
Related
related to SERVER-37715 Use DISTINCT_SCAN for $unwind-$group ... Backlog
is related to SERVER-35223 $elemMatch forces mongo to fetch docu... Closed
is related to SERVER-9370 Optimize $match $unwind $match sequen... Closed
Assigned Teams:
Query Optimization
Participants:
Case:

 Description   

I think unwind stage can be avoid when aggregation query only need index field (multi-key index field). And Also aggregation grouping or sorting stage can be solved via sorted index without quick sort processing. (I assume multi-key index has same structure with normal b-tree index.)

We have below collections which has array field and multi-key index on that field.

db.mkey.insert({ "_id" : 1, "apps": [ "A", "B", "C"] })
db.mkey.insert({ "_id" : 2, "apps": ["A"] })
db.mkey.insert({ "_id" : 3, "apps": ["A", "C"] })
db.mkey.createIndex({apps:1})

And we want to aggregate apps and count for each app.

db.mkey.aggregate([
    {'$unwind': '$apps'},
    {'$group': { _id: '$apps', sum: { '$sum': 1 } } },
    {'$match': { sum: { '$gte': 2 } } }
  ])
 
{ "_id" : "A", "sum" : 3 }
{ "_id" : "C", "sum" : 2 }

This aggregation query can't use multi-key index and need to expand (unwind) apps field of all documents and sort.
But if this query can use multi-key index of apps field, aggregation can be processed via multi-key index only not data files and no unwind and no quick sort processing. And I think this can be much faster than full collection scan.



 Comments   
Comment by David Storch [ 12/Apr/22 ]

It looks that got closed only because of some bookkeeping around project planning. Re-opening and sending back to the triage queue.

Comment by Taylor Graham [ 03/Nov/19 ]

For those of us that rely on a multikey index for counts, this would be a massive win.

Comment by Asya Kamsky [ 08/Jun/18 ]

This could also cover a count with $elemMatch, essentially something equivalent to:

db.coll.aggregate({$unwind:"a"},{$match:{"a.b":1, "a.c"2}},{$group:{_id:"$_id"}},{$count:"c"})

is the same as

db.coll.count({a:{$elemMatch:{b:1,c:2}}})

when a is an array 100% of the time, but it could use a multikey index in a covered fashion since unwind won't treat non-array any differently than array.

Derived from related ticket SERVER-35223.

Comment by Kelsey Schubert [ 22/Dec/16 ]

Hi matt.lee,

Thank you for the improvement request. I'm sending this ticket to the Query Team for consideration.

Kind regards,
Thomas

Generated at Thu Feb 08 04:15:19 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.