[SERVER-27392] text index with multi-key prefix is not rejected Created: 13/Dec/16 Updated: 17/Jan/24 Resolved: 01/Feb/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Text Search |
| Affects Version/s: | 3.4.0 |
| Fix Version/s: | 3.5.3 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Cyril Auburtin | Assignee: | David Storch |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
|||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | |||||||||||||||||||||||||||||||||||||
| Operating System: | ALL | |||||||||||||||||||||||||||||||||||||
| Steps To Reproduce: | Issue summary for Issue Description:In certain situations, compound text indexes in MongoDB 3.4 may have incorrectly indexed documents where either non-text prefix or suffix fields are multikey. Documents containing an array in fields referenced by a compound text index will not be returned by queries which use this index. Expected Behavior: Documents containing multikey values for the prefix or suffix fields should have been rejected by the index, causing the index build to fail or rejecting certain inserts to the collection. Indexes AffectedThis only affects compound text indexes where the prefix or suffix fields are non-text. For example:
Other compound text indexes are unaffected. Example:
Affected Versions
DiagnosisRun the validate() command on collections with compound text indexes. The command will return the following error if impacted:
RemediationIf validate() reports that a "text index contains an array", this issue can be remediated by re-building the affected indexes after removing or updating any documents containing data that would be incompatible with the index. The following steps may help to guide this effort. 1. Use $type to query the collection for documents containing multi-key values in the index prefix. For example, for the index { "a.b" : 1, "c" : "text" }, issuing the following query may identify all documents with multi-key values for "a.b":
You may also want to consider creating partial indexes to improve the performance of these operations. For example:
2. Update or remove these documents to remove the invalid multi-key values from the collection. 3. Drop and rebuild the compound text index. Previous description:
|
|||||||||||||||||||||||||||||||||||||
| Sprint: | Query 2017-01-23, Query 2017-02-13 | |||||||||||||||||||||||||||||||||||||
| Participants: | ||||||||||||||||||||||||||||||||||||||
| Description |
|
When doing a AND query between text search and another field, the other field seems ignored Below, there should be only one result |
| Comments |
| Comment by Githook User [ 01/Feb/17 ] | |||||||||||||||||
|
Author: {u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}Message: | |||||||||||||||||
| Comment by Cyril Auburtin [ 13/Dec/16 ] | |||||||||||||||||
|
Thanks Thomas, great answers, This way of embedding multiple translations in the same document is really practical, and efficient, but you can't avoid this (rare) case of false-friends matches like shown in first example. It would be great if Text search could give which language matched and be able to $and/filter it I could store simple documents (not nested), then I can avoid false-friends, but it's not really good for performance I think, because you need to link those small multilanguage documents to the main document, then do _id: {$in: allIdsFound} + other queries in the main collection, without being able to .limit or .skip for finding this ids, and then having to possibly repopulated after... Anyway, this is getting far from the issue to raised, just wanting to explain the context | |||||||||||||||||
| Comment by Kelsey Schubert [ 13/Dec/16 ] | |||||||||||||||||
|
Hi caub, Thank you for the report. The behavior you are observing with the first index is expected. $language:'fr' changes the tokenizer on the text search, and x.language':'fr' matches against any subdocument in the array containing this equality. Consequently it is is expected that
is returned since the first document in the array contains the name fart and the second document in the array has language set to "fr". For more information, please review our documentation. The second example you have presented is a bug, and I will be modifying this ticket to highlight this issue. The index
does not support multikey values on x.language, and MongoDB should reject the index in this case. See the example below for its expected behavior:
Please continue to watch for updates regarding the fix for this issue. Kind regards, | |||||||||||||||||
| Comment by Cyril Auburtin [ 13/Dec/16 ] | |||||||||||||||||
|
I've also tried:
| |||||||||||||||||
| Comment by Cyril Auburtin [ 13/Dec/16 ] | |||||||||||||||||
|
steps to reproduce:
|