[SERVER-21011] Certain queries against compound 2d/text indexes are incorrectly covered, return incorrect results Created: 19/Oct/15 Updated: 06/Dec/17 Resolved: 13/Oct/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Querying |
| Affects Version/s: | 3.0.6 |
| Fix Version/s: | 3.4.11, 3.6.0-rc1 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Travis Redman | Assignee: | David Storch |
| Resolution: | Done | Votes: | 1 |
| Labels: | RF, bkp | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||
| Backport Requested: |
v3.4, v3.2
|
||||||||||||||||||||||||||||
| Steps To Reproduce: |
In the query below, we request documents where b exists. As expected, we get 0 results.
When an index is added on (location, b), all documents are returned when $exists: true is set for b, which is incorrect.
|
||||||||||||||||||||||||||||
| Sprint: | QuInt C (11/23/15), Query 2017-10-23 | ||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||
| Description |
|
In Mongo 3.0.6, compound 2d indexes do not filter results correctly when an $exists operator is used on the second key. This can cause incorrect data to be returned depending on which index is selected. |
| Comments |
| Comment by Githook User [ 27/Oct/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||
|
Author: {'email': 'david.storch@10gen.com', 'name': 'David Storch', 'username': 'dstorch'}Message: The fix ensures that the tightness predicates over the (cherry picked from commit 744738bd23a5aed625dc1eed89851824fcf5e33a) Conflicts: | ||||||||||||||||||||||||||||||||||||||||||||||
| Comment by David Storch [ 13/Oct/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||
|
After further investigation, it looks like the fix for this ticket also handles the problem described by | ||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 13/Oct/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||
|
Author: {'email': 'david.storch@10gen.com', 'name': 'David Storch', 'username': 'dstorch'}Message: The fix ensures that the tightness predicates over the | ||||||||||||||||||||||||||||||||||||||||||||||
| Comment by David Storch [ 06/Oct/17 ] | ||||||||||||||||||||||||||||||||||||||||||||||
|
After investigating this further, I have concluded that there are actually two distinct issues at play here. The originally reported issue here has to do with existence queries like {$exists: true}, {$eq: null}, and so on. This is the direct result of our unconditional use of INEXACT_COVERED tightness for predicates assigned to the trailing fields of a "2d" or "text" index: https://github.com/mongodb/mongo/blob/33990519ca30e8a653aaca218c49539f5eba3468/src/mongo/db/query/planner_access.cpp#L341-L346 We can fix this by simply changing the blocks of code above to instead use INEXACT_FETCH tightness for existence predicates that can never be covered. The other issue only affects compound and multikey 2d indexes and is described in related ticket | ||||||||||||||||||||||||||||||||||||||||||||||
| Comment by J Rassi [ 25/Apr/16 ] | ||||||||||||||||||||||||||||||||||||||||||||||
|
One way to fix this issue would be to stop assigning predicates to trailing fields of 2d/text indexes if any document has been indexed with an array value along this path (we likely would be able to use the path-level multikey tracking infrastructure for this purpose). If no such document has been indexed, we would then be able to generate tight bounds for these predicates. | ||||||||||||||||||||||||||||||||||||||||||||||
| Comment by J Rassi [ 19/Nov/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||
|
The root cause of this issue is a bug in the covering logic for compound 2d and compound text indexes. The index format for 2d and text indexes is unusual, in that arrays (on non-2d/non-text fields) are stored verbatim in the index key, instead of being exploded into separate index keys. To illustrate:
As a result, predicates on these fields cannot generate bounds (for example, if we tried to generate bounds for the predicate {b: 'foo'}, we would get bounds of (b: ['foo', 'foo']), which would miss a document with a "b" value of ['foo', 'bar', 'baz']}). Instead of generating bounds for these predicates, the access planner always treats them as having INEXACT_COVERED tightness, and attaches them as a filter to the appropriate index access stage. This is simply incorrect for non-coverable predicates (INEXACT_FETCH), like {$exists: true}, {$eq: null}, etc. The correct behavior would be for the access planner to never attempt to cover predicates in this way that would generate bounds with INEXACT_FETCH tightness. Simple reproducer with $near predicate, compound 2d index:
Simple reproducer with non-$near predicate, compound 2d index:
Simple reproducer with $text predicate, compound text index:
| ||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 19/Oct/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||
|
tredman@fb.com, I'm able to reproduce the behavior you describe and we're investigating. |