Issue Description:
In certain situations, compound text indexes in MongoDB 3.4 may have incorrectly indexed documents where either non-text prefix or suffix fields are multikey. Documents containing an array in fields referenced by a compound text index will not be returned by queries which use this index.
Expected Behavior: Documents containing multikey values for the prefix or suffix fields should have been rejected by the index, causing the index build to fail or rejecting certain inserts to the collection.
Indexes Affected
This only affects compound text indexes where the prefix or suffix fields are non-text.
For example:
collection.createIndex({ "a.b" : 1, "c" : "text" })
Other compound text indexes are unaffected.
Example:
db.foo.drop()
db.foo.createIndex({ "a.b" : 1, "a.c" : "text" })
db.foo.insert({ "a" : [ { "b" : 1, "c": "foo" }, { "b" : 2, "c" : "bar" } ] })
db.foo.find({ "a.b" : 1, "$text" : { "$search" : "foo" } })
Affected Versions
- 3.4.0 - 3.4.24 - Compound text indexes allowed invalid index keys.
- 3.6.0 and later - Compound text indexes no longer allow invalid index keys. However, indexes already containing invalid index keys would continue to be affected until those index keys were removed or the index was rebuilt. See the Remediation section below.
Diagnosis
Run the validate() command on collections with compound text indexes. The command will return the following error if impacted:
- text index contains an array in document
Note the performance considerations of validate before proceeding. If you already know what indexes are impacted, proceed to remediation.
Remediation
If validate() reports that a "text index contains an array", this issue can be remediated by re-building the affected indexes after removing or updating any documents containing data that would be incompatible with the index. The following steps may help to guide this effort.
1. Use $type to query the collection for documents containing multi-key values in the index prefix. For example, for the index [ "a.b" : 1, "c" : "text" ], issuing the following query may identify all documents with multi-key values for "a.b":
db.collection.find({ "$or" : [ { "a": { "$type" : 4 } }, { "a.b" : { "$type" : 4 } } ] } )
You may also want to consider creating partial indexes to improve the performance of these operations. For example:
db.collection.createIndex({ "a" : 1 }, { "partialFilterExpression" : { "a" : { "$type" : 4 } } } )
db.collection.createIndex({ "a.b" : 1 }, { "partialFilterExpression" : { "a" : { "$type" : 4 } } } )
2. Update or remove these documents to remove the invalid multi-key values from the collection.
3. Drop and rebuild the compound text index.