Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-18824

Support matching text that has embedded NUL bytes with $regex

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.3.2
    • Component/s: Querying
    • Labels:
      None
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Steps To Reproduce:
      Hide

      mongodb shell test code:

      // test in version 3.0.3 and version 2.4.7

      > db.version()
      3.0.3
      > use testdb
      switched to db testdb
      > db
      testdb
      > db.test.find()
      > db.test.save({_id:"a","tag":"a"})
      WriteResult({ "nMatched" : 0, "nUpserted" : 1, "nModified" : 0, "_id" : "a" })
      > db.test.find()
      { "_id" : "a", "tag" : "a" }
      > db.test.save({_id:"a\x00","tag":"a0"})
      WriteResult({ "nMatched" : 0, "nUpserted" : 1, "nModified" : 0, "_id" : "a\u0000" })
      > db.test.find()
      { "_id" : "a", "tag" : "a" }
      { "_id" : "a\u0000", "tag" : "a0" }
      > db.test.find({_id: "a"})
      { "_id" : "a", "tag" : "a" }
      > db.test.find({_id: "a\x00"})
      { "_id" : "a\u0000", "tag" : "a0" }
      > db.test.find({_id: "a\u0000"})
      { "_id" : "a\u0000", "tag" : "a0" }
      > db.test.find({_id: "a\0"})
      { "_id" : "a\u0000", "tag" : "a0" }
      > db.test.find({_id:{"$regex":"^a"}}) // correct
      { "_id" : "a", "tag" : "a" }
      { "_id" : "a\u0000", "tag" : "a0" }
      > db.test.find({_id:{"$regex":"^a\x00"}})      // here { "_id" : "a", "tag" : "a" } is unexpect
      { "_id" : "a", "tag" : "a" }
      { "_id" : "a\u0000", "tag" : "a0" }
      > db.test.find({_id:{"$regex":"^a\u0000"}})   // and also here the first item is unexpect
      { "_id" : "a", "tag" : "a" }
      { "_id" : "a\u0000", "tag" : "a0" }
      >
       
      // test in 1.8.3 (correct)
       
      > db.version()
      1.8.3
      > use testdb
      switched to db testdb
      > db
      testdb
      > db.test.find()
      > db.test.save({_id:"a","tag":"a"})
      > db.test.find()
      { "_id" : "a", "tag" : "a" }
      > db.test.save({_id:"a\x00","tag":"a0"})
      > db.test.find()
      { "_id" : "a", "tag" : "a0" }
      > db.test.find({_id: "a"})
      { "_id" : "a", "tag" : "a0" }
      > db.test.find({_id: "a\x00"})
      { "_id" : "a", "tag" : "a0" }
      > db.test.find({_id: "a\u0000"})
      { "_id" : "a", "tag" : "a0" }
      > db.test.find({_id: "a\0"})
      { "_id" : "a", "tag" : "a0" }
      > db.test.find({_id:{"$regex":"^a"}}) // here lost an item 
      { "_id" : "a", "tag" : "a0" }
      > db.test.find({_id:{"$regex":"^a\x00"}}) // correct
      { "_id" : "a", "tag" : "a0" }
      > db.test.find({_id:{"$regex":"^a\u0000"}}) // correct
      { "_id" : "a", "tag" : "a0" }
      >
      

      Show
      mongodb shell test code: // test in version 3.0.3 and version 2.4.7 > db.version() 3.0.3 > use testdb switched to db testdb > db testdb > db.test.find() > db.test.save({_id: "a" , "tag" : "a" }) WriteResult({ "nMatched" : 0, "nUpserted" : 1, "nModified" : 0, "_id" : "a" }) > db.test.find() { "_id" : "a" , "tag" : "a" } > db.test.save({_id: "a\x00" , "tag" : "a0" }) WriteResult({ "nMatched" : 0, "nUpserted" : 1, "nModified" : 0, "_id" : "a\u0000" }) > db.test.find() { "_id" : "a" , "tag" : "a" } { "_id" : "a\u0000" , "tag" : "a0" } > db.test.find({_id: "a" }) { "_id" : "a" , "tag" : "a" } > db.test.find({_id: "a\x00" }) { "_id" : "a\u0000" , "tag" : "a0" } > db.test.find({_id: "a\u0000" }) { "_id" : "a\u0000" , "tag" : "a0" } > db.test.find({_id: "a\0" }) { "_id" : "a\u0000" , "tag" : "a0" } > db.test.find({_id:{ "$regex" : "^a" }}) // correct { "_id" : "a" , "tag" : "a" } { "_id" : "a\u0000" , "tag" : "a0" } > db.test.find({_id:{ "$regex" : "^a\x00" }}) // here { "_id" : "a", "tag" : "a" } is unexpect { "_id" : "a" , "tag" : "a" } { "_id" : "a\u0000" , "tag" : "a0" } > db.test.find({_id:{ "$regex" : "^a\u0000" }}) // and also here the first item is unexpect { "_id" : "a" , "tag" : "a" } { "_id" : "a\u0000" , "tag" : "a0" } >   // test in 1.8.3 (correct)   > db.version() 1.8.3 > use testdb switched to db testdb > db testdb > db.test.find() > db.test.save({_id: "a" , "tag" : "a" }) > db.test.find() { "_id" : "a" , "tag" : "a" } > db.test.save({_id: "a\x00" , "tag" : "a0" }) > db.test.find() { "_id" : "a" , "tag" : "a0" } > db.test.find({_id: "a" }) { "_id" : "a" , "tag" : "a0" } > db.test.find({_id: "a\x00" }) { "_id" : "a" , "tag" : "a0" } > db.test.find({_id: "a\u0000" }) { "_id" : "a" , "tag" : "a0" } > db.test.find({_id: "a\0" }) { "_id" : "a" , "tag" : "a0" } > db.test.find({_id:{ "$regex" : "^a" }}) // here lost an item { "_id" : "a" , "tag" : "a0" } > db.test.find({_id:{ "$regex" : "^a\x00" }}) // correct { "_id" : "a" , "tag" : "a0" } > db.test.find({_id:{ "$regex" : "^a\u0000" }}) // correct { "_id" : "a" , "tag" : "a0" } >
    • Sprint:
      Query 10 (02/22/16)

      Description

      There is a beginning of time bug in MongoDB's integration with the PCRE library that causes the string data stored in a document to be truncated at the first NUL byte when attempting to do pattern matching on it. This line is the cause of the issue because we will end up using the StringPiece(const char* str) constructor, which calls strlen(), and thus causes the pattern matching on the string data to stop at the first NUL byte.

      We should instead use the StringPiece(const char* offset, int len) constructor and pass e.valuestrsize() - 1 as the length of the string data.

      Additionally, it is worth noting that PCRE patterns cannot contain embedded NUL bytes. Instead, they need to be escaped as

      "\\000",
      "\\x00",
      

      etc. See my comment below for more details.


      Original description

      In version 2.4.7 and 3.0.3:

      The value contains an special characters '\u0000' (\x00, \0),use prefix search like "^a\u0000", but get an item which do not have the prefix "a\u0000" like "a".

      In version 1.8.3:

      when search with "^a", the item "a\u0000" is not in result set.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              max.hirschhorn Max Hirschhorn
              Reporter:
              ma6174 ma6174
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: