Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-27392

text index with multi-key prefix is not rejected

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 3.5.3
    • Affects Version/s: 3.4.0
    • Component/s: Text Search
    • Labels:
      None
    • Fully Compatible
    • ALL
    • Hide
      Issue summary for SERVER-27392

      Issue Description:

      In certain situations, compound text indexes in MongoDB 3.4 may have incorrectly indexed documents where either non-text prefix or suffix fields are multikey. Documents containing an array in fields referenced by a compound text index will not be returned by queries which use this index.

      Expected Behavior: Documents containing multikey values for the prefix or suffix fields should have been rejected by the index, causing the index build to fail or rejecting certain inserts to the collection.

      Indexes Affected

      This only affects compound text indexes where the prefix or suffix fields are non-text.

      For example:

      collection.createIndex({ "a.b" : 1, "c" : "text" })
      

      Other compound text indexes are unaffected.

      Example:

      // Only occurrs in MongoDB 3.4
      db.foo.drop()
      db.foo.createIndex({ "a.b" : 1, "a.c" : "text" })
      
      // Should be rejected due to multikey index prefix but is accepted
      db.foo.insert({ "a" : [ { "b" : 1, "c": "foo" }, { "b" : 2, "c" : "bar" } ] })
      
      // As a result, the indexed value for 
      db.foo.find({ "a.b" : 1, "$text" : { "$search" : "foo" } })
      // nothing is returned
      

      Affected Versions

      • 3.4.0 - 3.4.24 - Compound text indexes allowed invalid index keys.
      • 3.6.0 and later - Compound text indexes no longer allow invalid index keys. However, indexes already containing invalid index keys would continue to be affected until those index keys were removed or the index was rebuilt. See the Remediation section below.

      Diagnosis

      Run the validate() command on collections with compound text indexes. The command will return the following error if impacted:

      • text index contains an array in document

      Remediation

      If validate() reports that a "text index contains an array", this issue can be remediated by re-building the affected indexes after removing or updating any documents containing data that would be incompatible with the index. The following steps may help to guide this effort.

      1. Use $type to query the collection for documents containing multi-key values in the index prefix. For example, for the index { "a.b" : 1, "c" : "text" }, issuing the following query may identify all documents with multi-key values for "a.b":

      	db.collection.find({ "$or" : [ { "a": { "$type" : 4 } }, { "a.b" : { "$type" : 4 } } ] } ) 
      

      You may also want to consider creating partial indexes to improve the performance of these operations. For example:

      	db.collection.createIndex({ "a" : 1 }, { "partialFilterExpression" : { "a" : { "$type" : 4 } } } ) 
      	db.collection.createIndex({ "a.b" : 1 }, { "partialFilterExpression" : { "a" : { "$type" : 4 } } } )
      

      2. Update or remove these documents to remove the invalid multi-key values from the collection.

      3. Drop and rebuild the compound text index.


      Previous description:

      > db.a.find()
      > db.a.createIndex({'x.name':'text'})
      {
              "createdCollectionAutomatically" : true,
              "numIndexesBefore" : 1,
              "numIndexesAfter" : 2,
              "ok" : 1
      }
      > db.a.insert({i:1, x:[{name:'pet', language:'en'}, {name:'animal domestique', language:'fr'}]})
      WriteResult({ "nInserted" : 1 })
      > db.a.insert({i:1, x:[{name:'fart', language:'en'}, {name:'pet', language:'fr'}]})
      WriteResult({ "nInserted" : 1 })
      > db.a.insert({i:1, x:[{name:'wax', language:'en'}, {name:'farter', language:'fr'}]})
      WriteResult({ "nInserted" : 1 })
      > db.a.find({$text:{$search:'fart', $language:'fr'}, 'x.language':'fr'})
      { "_id" : ObjectId("584868134fb941395745c6bb"), "i" : 1, "x" : [ { "name" : "wax", "language" : "en" }, { "name" : "farter", "language" : "fr" } ] }
      { "_id" : ObjectId("584867ec4fb941395745c6ba"), "i" : 1, "x" : [ { "name" : "fart", "language" : "en" }, { "name" : "pet", "language" : "fr" } ] }
      > db.a.dropIndexes()
      > db.a.createIndex({'x.language':1, 'x.name': 'text'})
      > db.a.find({'x.language':'fr', $text:{$search: 'fart', $language:'fr'}})
      > db.a.find({'x.language':'fr', $text:{$search: 'fart', $language:'en'}})
      > db.a.find({'x.language':'en', $text:{$search: 'fart', $language:'en'}})
      // empty results
      
      Show
      Issue summary for SERVER-27392 Issue Description : In certain situations, compound text indexes in MongoDB 3.4 may have incorrectly indexed documents where either non-text prefix or suffix fields are multikey. Documents containing an array in fields referenced by a compound text index will not be returned by queries which use this index. Expected Behavior : Documents containing multikey values for the prefix or suffix fields should have been rejected by the index, causing the index build to fail or rejecting certain inserts to the collection. Indexes Affected This only affects compound text indexes where the prefix or suffix fields are non-text. For example: collection.createIndex({ "a.b" : 1, "c" : "text" }) Other compound text indexes are unaffected. Example : // Only occurrs in MongoDB 3.4 db.foo.drop() db.foo.createIndex({ "a.b" : 1, "a.c" : "text" }) // Should be rejected due to multikey index prefix but is accepted db.foo.insert({ "a" : [ { "b" : 1, "c" : "foo" }, { "b" : 2, "c" : "bar" } ] }) // As a result, the indexed value for db.foo.find({ "a.b" : 1, "$text" : { "$search" : "foo" } }) // nothing is returned Affected Versions 3.4.0 - 3.4.24 - Compound text indexes allowed invalid index keys. 3.6.0 and later - Compound text indexes no longer allow invalid index keys. However, indexes already containing invalid index keys would continue to be affected until those index keys were removed or the index was rebuilt. See the Remediation section below. Diagnosis Run the validate() command on collections with compound text indexes . The command will return the following error if impacted: text index contains an array in document Remediation If validate() reports that a "text index contains an array", this issue can be remediated by re-building the affected indexes after removing or updating any documents containing data that would be incompatible with the index. The following steps may help to guide this effort. 1. Use $type to query the collection for documents containing multi-key values in the index prefix. For example, for the index { "a.b" : 1, "c" : "text" } , issuing the following query may identify all documents with multi-key values for "a.b" : db.collection.find({ "$or" : [ { "a" : { "$type" : 4 } }, { "a.b" : { "$type" : 4 } } ] } ) You may also want to consider creating partial indexes to improve the performance of these operations. For example: db.collection.createIndex({ "a" : 1 }, { "partialFilterExpression" : { "a" : { "$type" : 4 } } } ) db.collection.createIndex({ "a.b" : 1 }, { "partialFilterExpression" : { "a" : { "$type" : 4 } } } ) 2. Update or remove these documents to remove the invalid multi-key values from the collection. 3. Drop and rebuild the compound text index. Previous description: > db.a.find() > db.a.createIndex({'x.name':'text'}) { "createdCollectionAutomatically" : true, "numIndexesBefore" : 1, "numIndexesAfter" : 2, "ok" : 1 } > db.a.insert({i:1, x:[{name:'pet', language:'en'}, {name:'animal domestique', language:'fr'}]}) WriteResult({ "nInserted" : 1 }) > db.a.insert({i:1, x:[{name:'fart', language:'en'}, {name:'pet', language:'fr'}]}) WriteResult({ "nInserted" : 1 }) > db.a.insert({i:1, x:[{name:'wax', language:'en'}, {name:'farter', language:'fr'}]}) WriteResult({ "nInserted" : 1 }) > db.a.find({$text:{$search:'fart', $language:'fr'}, 'x.language':'fr'}) { "_id" : ObjectId("584868134fb941395745c6bb"), "i" : 1, "x" : [ { "name" : "wax", "language" : "en" }, { "name" : "farter", "language" : "fr" } ] } { "_id" : ObjectId("584867ec4fb941395745c6ba"), "i" : 1, "x" : [ { "name" : "fart", "language" : "en" }, { "name" : "pet", "language" : "fr" } ] } > db.a.dropIndexes() > db.a.createIndex({'x.language':1, 'x.name': 'text'}) > db.a.find({'x.language':'fr', $text:{$search: 'fart', $language:'fr'}}) > db.a.find({'x.language':'fr', $text:{$search: 'fart', $language:'en'}}) > db.a.find({'x.language':'en', $text:{$search: 'fart', $language:'en'}}) // empty results
    • Query 2017-01-23, Query 2017-02-13

      When doing a AND query between text search and another field, the other field seems ignored

      Below, there should be only one result

            Assignee:
            david.storch@mongodb.com David Storch
            Reporter:
            caub Cyril Auburtin
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated:
              Resolved: