[SERVER-13899] "Whole index scan" query solutions can use incompatible indexes, return incorrect results Created: 10/May/14 Updated: 15/Nov/21 Resolved: 13/May/14 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Querying |
| Affects Version/s: | 2.6.1, 2.7.0 |
| Fix Version/s: | 2.6.2, 2.7.1 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Andrew Zammit | Assignee: | J Rassi |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
version 2.6.0 |
||
| Issue Links: |
|
||||||||||||||||||||
| Backport Completed: | |||||||||||||||||||||
| Steps To Reproduce: |
|
||||||||||||||||||||
| Sprint: | Server 2.7.1 | ||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Description |
|
Issue Status as of May 16, 2014 ISSUE SUMMARY USER IMPACT WORKAROUNDS AFFECTED VERSIONS FIX VERSION RESOLUTION DETAILS Original descriptionIf a particular field is defined as a hashed index, when sorting the result set is not in the expected order. When using a non-hashed (normal) index or no index at all on the specific String field, the result set's order is correct when sorting. This can be reproduced in the MongoDB shell, Node.js driver, and PHP driver, so I'm assuming this is a problem with the core server. The steps to reproduce show only how to reproduce the problem, but the following code illustrates the effect:
|
| Comments |
| Comment by Githook User [ 17/May/14 ] | ||||||||||||||||||||||||||
|
Author: {u'username': u'jrassi', u'name': u'Jason Rassi', u'email': u'rassi@10gen.com'}Message: | ||||||||||||||||||||||||||
| Comment by Githook User [ 16/May/14 ] | ||||||||||||||||||||||||||
|
Author: {u'username': u'jrassi', u'name': u'Jason Rassi', u'email': u'rassi@10gen.com'}Message: | ||||||||||||||||||||||||||
| Comment by J Rassi [ 13/May/14 ] | ||||||||||||||||||||||||||
Yes. All indexes are considered, and only some are eligible. | ||||||||||||||||||||||||||
| Comment by Andrew Zammit [ 13/May/14 ] | ||||||||||||||||||||||||||
|
Jason, That makes a bit more sense. Thanks for spelling it out. What if I had two indices? For example: a:1 and a:'hashed', although I'm not sure of the implications of this in practice. When using
| ||||||||||||||||||||||||||
| Comment by J Rassi [ 13/May/14 ] | ||||||||||||||||||||||||||
|
The fix is not quite as you describe. It affects one particular type of query plan: one that scans an entire index in order to avoid an in-memory sort. Consider how the index {a:1} could be used to help accelerate the query find({b:1}).sort({a:1}); a "whole index scan" on this index can provide the sort order for the query (and a filter is employed to discard results that don't match the query predicate). In practice this is often not a useful query plan, but it does work well if many documents match the query predicate and the user requests a limit. The only reason that this query plan is viable is that the index order for the {a:1} index exactly matches the sort order {a:1} requested. Since geo indexes, hashed indexes and text indexes do not store documents in this order, they can't be considered for this query plan (compound indexes like {a:1, b: "2dsphere"} do actually store documents in this order, but because these indexes are sparse on geo fields, additional logic is required to guarantee that these index scans won't miss results; see comment at query_planner.cpp:833). | ||||||||||||||||||||||||||
| Comment by Andrew Zammit [ 13/May/14 ] | ||||||||||||||||||||||||||
|
Thanks Jason for hopping on this so quickly. To confirm, the solution is to not use the index to sort unless it is a regular index (i.e. a:1 is regular, but a:'hashed' is irregular). Wouldn't performance degrade if it is not utilizing the index? You're the expert and perhaps I don't fully understand why text, 2dsphere, hashed, etc. can't be used, but I'm not convinced this is optimal solution. | ||||||||||||||||||||||||||
| Comment by Githook User [ 13/May/14 ] | ||||||||||||||||||||||||||
|
Author: {u'username': u'jrassi', u'name': u'Jason Rassi', u'email': u'rassi@10gen.com'}Message: | ||||||||||||||||||||||||||
| Comment by J Rassi [ 10/May/14 ] | ||||||||||||||||||||||||||
|
Andrew, thanks for reporting this issue. When the query planner is picking an index for a whole index scan solution (to provide a sort), it considers the index's sparseness flag but doesn't consider whether or not the index is of a special index type. Confirmed as a regression since 2.4. Incorrectly using a hashed index for a sort will generate results out of order (per repro above). Incorrectly using a text or geo index for a sort will similarly generate results out of order, and additionally can cause results to be missing from the result set:
|