ISSUE DESCRIPTION AND IMPACT
This patch, included in MongoDB 3.6, corrected covered query behavior by changing how empty and single-element arrays are treated with respect to multikey indexes:
- In MongoDB 3.4 and earlier, empty and single-element values like [] and [1] did not cause an index to become multikey. Further, covered queries using an index with these key values returned scalar values like null or 1, independent of whether the underlying documents contained array data or not.
- In versions 3.6 and later, new indexes over such values are multikey indexes, making them ineligible for covered query plans. As a result, covered queries are forced to use plans that return array values like [] or [1].
However, this change did not address the index metadata of existing indexes built on MongoDB versions 3.4 and earlier. This means that existing indexes containing empty or single-element arrays (like [] or [1]) are not guaranteed to be marked as multikey and may still incorrectly be eligible for use by covered queries.
Applications are unlikely to be affected because:
- Data models that regularly use array data for the fields in affected indexes almost immediately "self-heal" the metadata inconsistency that this change allowed. After upgrading to MongoDB 3.6, an index receives a write that correctly makes it multikey.
- For indexes that did not self-heal, this issue incorrectly preserved the behavior for these multikey values prior to MongoDB 3.6, so applications have always had to account for the possibility of both scalar and array values in query results.
DIAGNOSIS AND AFFECTED VERSIONS
When an index is affected by this change on MongoDB 3.6 or later, covered queries using those indexes still return scalar instead of array values (1 instead of [1]). If at any time the index receives an insert (including of an empty or single-element array), it becomes a multikey index and correct behavior becomes enforced.
Until then, validate() reports the index as inconsistent because it detects the discrepancy between the following:
- The index is not marked as multikey.
- The index contains values that mean it should be multikey.
REMEDIATION AND WORKAROUNDS
Data models that regularly use array data for the fields in affected indexes almost immediately "self-heal" the metadata inconsistency that this change allowed, as soon as a write occurs that would make the index multikey.
For other cases, the remediation for this issue is to rebuild affected indexes.
Original description
A multikey index is one with multiple distinct values for a field, and covered mode isn't used for multikey indexes. However, it is possible to have an array containing only one unique value and such arrays are currently handled incorrectly by the covered indexing implementation.
> c.drop() true > c.ensureIndex( { a:1 } ) > c.save( { a: [ 1 ] } ) > c.find( { a:1 }, { a:1, _id:0 } ).hint( { a:1 } ) { "a" : 1 } > c.find( { a:1 }, { a:1, _id:0 } ).hint( { $natural:1 } ) { "a" : [ 1 ] } > c.drop() false > c.ensureIndex( { a:1 } ) > c.save( { a: [ 1, 1 ] } ) > c.find( { a:1 }, { a:1, _id:0 } ).hint( { a:1 } ) { "a" : 1 } > c.find( { a:1 }, { a:1, _id:0 } ).hint( { $natural:1 } ) { "a" : [ 1, 1 ] }
I believe there is a similar case where the array contains numbers that compare as equal but are of different data types.
- is duplicated by
-
SERVER-26921 Parallel array check in key generation is inconsistent with multikey definition
- Closed
-
SERVER-31932 The existence of index affects the type of projection field.
- Closed
- related to
-
SERVER-6293 Index only query fills in missing values with null
- Closed
-
SERVER-22400 Extend BtreeKeyGenerator::getKeys() to return which indexed paths are multikey
- Closed