[SERVER-31444] Queries against multikey trailing fields of a compound 2d index are incorrectly covered, leading to incorrect results Created: 06/Oct/17 Updated: 06/Dec/22 Resolved: 13/Oct/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Geo, Querying |
| Affects Version/s: | 2.6.12, 3.0.15, 3.2.17, 3.4.9, 3.5.13 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | David Storch | Assignee: | Backlog - Query Team (Inactive) |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Assigned Teams: |
Query
|
||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||
| Description |
|
When indexing array fields, most indexes contain one key per element of the array, hence the terminology "multikey indexes" (documented here). However, compound "2d" indexes use a different format for the non-2d fields. Namely, all of the array elements are stored in a single key whose value is itself an array. Consider the following example:
As you can see, the inserted document leads to just a single index key. The value of this index key for the "b.c" field is the array [1, 2]. Because of this index format, predicates against the trailing fields of an index are not used to generate bounds against the index. A predicate like {"b.c": {$eq: 2}} would normally result in point lookup in the index for the value 2. However, this would incorrectly miss the above document, because the index key value is the entire array [1, 2]. Instead, the predicate is attached as a filter to the IXSCAN stage, as you can see from the explain of the query below:
However, this is simply incorrect for certain kinds of queries due to their matching behavior over arrays. For instance, queries which check equality against arrays may return spurious results:
Similarly, $type:"array" queries may return spurious results:
You can see that these results are spurious by issuing the same queries without the $geoWithin predicate and observing that the result set is empty:
This problem is closely related to that reported in |
| Comments |
| Comment by David Storch [ 13/Oct/17 ] | ||||||||||
|
After further investigation, it appears that this is a duplicate of However, our investigation did turn up yet another problem related to the trailing fields of a "2d" index: see | ||||||||||
| Comment by David Storch [ 06/Oct/17 ] | ||||||||||
|
The easiest way to fix this issue is to stop assigning predicates to the trailing fields of "2d" indexes wholesale. However, this fix would likely result in a loss of performance for some workloads (and would render compound "2d" indexes more or less useless). Another way to fix this issue would be to stop assigning predicates a trailing field of a "2d" index when that field can contain an array. This would involve enabling path-level multikey tracking, taking advantage of the infrastructure added for A third way to fix, and probably the most desirable (and most complex), is to change the index format for "2d" indexes. This would bring the planning code for "2d" in line with that for regular indexes, and would allow us to generate bounds over all "2d" indexes, even those whose non-2d fields contain arrays. However, this comes with it a slew of upgrade/downgrade and mixed version related concerns. Curiously, this problem does not affect indexes whose "2d" field is itself multikey:
The reason is that the access planner already refrains from adding covered filters to an IXSCAN stage when the index is tagged as multikey: |