[SERVER-13298] distinct doesn't use index to distinct multikey dotted fields Created: 20/Mar/14 Updated: 29/Jan/20 Resolved: 04/Apr/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Index Maintenance |
| Affects Version/s: | 2.4.10, 2.6.3, 3.0.14, 3.2.10, 3.5.13 |
| Fix Version/s: | 4.1.10 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Asya Kamsky | Assignee: | Ian Boros |
| Resolution: | Done | Votes: | 17 |
| Labels: | QFB, asya, storch | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||||||
| Backport Requested: |
v4.0, v3.6
|
||||||||||||||||||||||||||||||||
| Sprint: | Query 2019-03-11, Query 2019-03-25, Query 2019-04-08 | ||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||||||||||
| Description |
|
Had a compound index on
where lineitems is an array. Did db.orders.distinct("lineitems.returnflag") which caused a collection scan. Non-multikey correctly used the index. |
| Comments |
| Comment by Githook User [ 04/Apr/19 ] | |||||||||||
|
Author: {'name': 'Ian Boros', 'username': 'puppyofkosh', 'email': 'puppyofkosh@gmail.com'}Message: | |||||||||||
| Comment by Danny Couture [ 17/Nov/16 ] | |||||||||||
|
important issue for us... preventing us from doing combobox for filtering Not sure if you keep "Affects Version/s" field up to date... but we reported this bug on 3.2.10... Really looking forward for the fix too! | |||||||||||
| Comment by Konstantin Bodnia [ 20/Oct/15 ] | |||||||||||
|
Ok, here's the case. We're having quite huge collection for healthcare insurance products with really huge amount of fields. For better indexing and nicer performance I created an object `_indexed` with like 30 subfields for different coverages.
etc... And they're arrays. I was hoping to get the values out of them to dynamically updated filters upon further and further filtering in our comparison tool. But ended up here. So for my case it would be much of a use. I could put them all into plain fields to avoid this problem, but it would be hell in projection and other things... Really looking forward for the fix! Thanks. | |||||||||||
| Comment by David Storch [ 01/Jul/14 ] | |||||||||||
|
Per discussion with hari.khalsa@10gen.com, this is not ready for commit just yet. The distinct optimization uses a special index access stage which returns the distinct index keys to its parent stage. In the multikey dotted case, however, the distinct stage would have to check for null or undefined keys. In the case of null or undefined, it must fetch the full document in order to disambiguate between literal nulls versus null by virtue of missing fields. We have decided to hold off unless we see that users really need this. | |||||||||||
| Comment by hari.khalsa@10gen.com [ 25/Mar/14 ] | |||||||||||
|
I mocked up a fix that would allow the embedded multikey index to be used. It would imply a change in behavior. For the data:
We expect this behavior currently:
But if we're using the index we'll have some "no such key found" entries appear in the output that we previously didn't consider part of the value set for distinct. |