Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-37575

BinData compare uses the Base64 string not the raw bytes

    • Type: Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Priority: Icon: Minor - P4 Minor - P4
    • None
    • Affects Version/s: 3.4.16
    • Component/s: None
    • None
    • Environment:
      Linux x64
    • ALL
    • Hide

      I can make 3 binary encoded IPv6 addresses like this:
       

      > var a = BinData(0,"JA4A8gAxqTwciCuF5GGzAA==")
      > var b = BinData(0,"JA4A8gAxqTwciCuF5GGzTw==")
      > var c = BinData(0,"JA4A8gAxqTwciCuF5GGz/w==")

      You can confirm these sort properly a < b < c in hex like this:

      > a.hex()
      240e00f20031a93c1c882b85e461b300
      > b.hex()
      240e00f20031a93c1c882b85e461b34f
      > c.hex()
      240e00f20031a93c1c882b85e461b3ff

      And of course as you would expect the shell agrees:

      > a.hex() < b.hex()
      true
      > b.hex() < c.hex()
      true
      > a.hex() < c.hex()
      true

      However if I compare the objects themselves I get a peculiar answer:

      > a < b
      true
      > b < c
      false
      > a < c
      false

      After some digging this peculiar result seemed to match the string-compare of the base-64 encoded data:

      > a.toString() < b.toString()
      true
      > b.toString() < c.toString()
      false
      > a.toString() < c.toString()
      false
       

      Which tragically is inconsistent with the "byte-wise lexicographic sort" implied by the documentation. Anecdotal evidence suggests that server-side sorting has the same behavior.

      Show
      I can make 3 binary encoded IPv6 addresses like this:   > var a = BinData(0,"JA4A8gAxqTwciCuF5GGzAA==") > var b = BinData(0,"JA4A8gAxqTwciCuF5GGzTw==") > var c = BinData(0,"JA4A8gAxqTwciCuF5GGz/w==") You can confirm these sort properly a < b < c in hex like this: > a.hex() 240e00f20031a93c1c882b85e461b300 > b.hex() 240e00f20031a93c1c882b85e461b34f > c.hex() 240e00f20031a93c1c882b85e461b3ff And of course as you would expect the shell agrees: > a.hex() < b.hex() true > b.hex() < c.hex() true > a.hex() < c.hex() true However if I compare the objects themselves I get a peculiar answer: > a < b true > b < c false > a < c false After some digging this peculiar result seemed to match the string-compare of the base-64 encoded data: > a.toString() < b.toString() true > b.toString() < c.toString() false > a.toString() < c.toString() false   Which tragically is inconsistent with the "byte-wise lexicographic sort" implied by the documentation. Anecdotal evidence suggests that server-side sorting has the same behavior.

      https://docs.mongodb.com/manual/reference/bson-type-comparison-order/#bindata

      Says MongoDB sorts BinData in the following order:

      .... Finally, by the data, performing a byte-by-byte comparison.
       
      We had engineered this around the expectation that this order would compare the bytes in the raw data however we have discovered to our dismay that it appears to be comparing the Base64-encoded strings instead which is not the same order as comparing the bytes of the raw data.
      Note that this this doesn't just apply to the shell; I was led to this conclusion when queries were not returning data I was expecting and assume I could demonstrate the same behavior with a server-side query as well.

      This is actually a serious bug because it means you can't properly sequence binary data (like IPv6 addresses) in a way that supports range compares but I have no expectation that this will actually be fixed –  I suspect instead that this will turn into a documentation bug that warns others off from using this datatype.

       

            Assignee:
            asya.kamsky@mongodb.com Asya Kamsky
            Reporter:
            glen.miner Glen Miner
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated:
              Resolved: