[SERVER-69257] COUNT_SCAN plan not selected when unfiltered $group + $sum on _id performed Created: 30/Aug/22 Updated: 31/Oct/23 |
|
| Status: | Blocked |
| Project: | Core Server |
| Component/s: | Query Execution |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Alex Bevilacqua | Assignee: | Backlog - Query Optimization |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Assigned Teams: |
Query Optimization
|
||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||
| Sprint: | QO 2022-09-19, QO 2023-08-07, QO 2023-08-21 | ||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||
| Description |
|
As of
Without specifying a filter the empty $match stage would result in a COLLSCAN plan winning, though a COUNT_SCAN should be appropriate.
This could be worked around by unshifting a {$sort: { _id: 1 }} stage to the pipeline, however after discussing with david.storch@mongodb.com this may be addressable server-side. |
| Comments |
| Comment by Peter Volk [ 15/Aug/23 ] | |||||||||||||||||||||||||
|
Marking this as blocked by SERVER-23406 as the current performance numbers don't allow this patch to be merged as long as the index scans are significantly slower than a col scan. | |||||||||||||||||||||||||
| Comment by Milena Ivanova [ 07/Aug/23 ] | |||||||||||||||||||||||||
|
Local benchmark shows that for collections with a relatively small number of fields the collection scan is more efficient than the COUNT_SCAN over the _id index. Bellow are results for collections of size 10000, 100000, and 1 000 000. The documents have 6 fields {_id: i, a: i * 5, b: 100, c: 1000, d: "xxx", e: "abc".repeat(100) }. Each test was run 7 times and the first and last measurements were removed when computing the average. The following results were obtained when running with the SBE engine.
Similar results for collections with 3 fields, one idea faster over the large collection sizes. | |||||||||||||||||||||||||
| Comment by Charlie Swanson [ 06/Sep/22 ] | |||||||||||||||||||||||||
|
This does seem to be pretty easily addressed. This is a pretty hack-y patch I drummed up which seems to "fix" it and prefer a full COUNT_SCAN of the "_id" index over a COLLSCAN:
I'll nominate this for a quick win. It won't be super easy since I believe we'll want to benchmark it and possibly consider whether a non-id index should be eligible? Only non-partial, non-multi key indexes would work. I'm not sure how we'd effectively choose amongst them though, so the "_id" index may be the sensible default. | |||||||||||||||||||||||||
| Comment by Charlie Swanson [ 06/Sep/22 ] | |||||||||||||||||||||||||
|
Oops! Disregard my last comment. I confused myself by looking at the example in that linked ticket, which does not include a filter or a field path access:
It was later clarified that this was a typo though and "category" is meant to be a field path reference:
That's a pretty different query than the pure count in this ticket, so I will re-open and undo the duplication. I'll also fix up the example on that ticket | |||||||||||||||||||||||||
| Comment by Charlie Swanson [ 06/Sep/22 ] | |||||||||||||||||||||||||
|
Thanks christopher.harris@mongodb.com and nicholas.zolnierz@mongodb.com. This does appear to be a duplicate of SERVER-29444. The empty predicate is the same as no $match, and the two are both impacted by this bug of not generating an entire-index COUNT_SCAN. The ticket christopher.harris@mongodb.com linked is worth keeping in mind for any change like this, but I think described a filtered COUNT, which makes it eligible for index scans. |