Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-5065

a missing field nested within an array is sorted as 'null', even if the same field exists inside another array element

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 2.0.2, 2.1.0
    • Component/s: Querying
    • Labels:
    • ALL

      Background

      Sorting

      If there are multiple values for a field, only one of them is used to sort a document. For example when sorting the document { a:[ 1, 3, 2 ] } using sort spec { a:1 } only one of the values of 'a' is used to order the document. The value of 'a' chosen for sorting is the first that would be encountered while traversing an index on the sort spec (in this case { a:1 }) to resolve the query. For example if the query is {}, the first value of 'a' encountered on index { a:1 } is 1. If the query is { a:{ $gte:2 } } the first value is 2. Note that this policy is implemented regardless of whether there is actually an index { a:1 } or if sorting is performed using the { a:1 } ordering in memory, without an index. (The results must be the same regardless of whether or not an index is used.)

      Consider the case of document { a:[ 1, null, 2 ] }. In this example, 2 is the largest value of 'a'. It is the first value of 'a' that would be encountered for index { a:-1 } and is the value used for sorting. The smallest value of 'a' is 'null'. It is the first value of 'a' that would be encountered for index { a:1 } and is the one used for sorting. Furthermore, given two documents { _id:'x', a:[ 5, null ] } and { _id:'y', a:[ 2, 3 ] }, if sorting is performed according to order { a:1 } the documents will be ordered with document x followed by document y because null is less than 2.

      This behavior was requested in SERVER-480.

      Index Key Extraction

      In many cases, indexing a nested field within an array uses an existing value from the document. Eg for index { 'a.b':1 }

      document { a:[ { b:5 } ] } -> produces index key 'a.b':5
      { a:[ { b:5 }, { b:6 } ] } -> two keys 'a.b':5, 'a.b':6

      If the nested field is missing however, a null value is stored in the index

      document { } (no 'a' field present) -> produces index key 'a.b':null
      { a:[ ] } -> 'a.b':null
      { a:[ {} ] } -> 'a.b':null
      { a:[ { x:1 } ] } -> 'a.b':null
      { a:[ 7 ] } -> 'a.b':null

      In the case where some array values have an 'a.b' field and some do not, a mixture of null and non null index keys is produced:

      { a:[ { b:5 }, {} ] } -> 'a.b':5, 'a.b':null
      { a:[ { b:5 }, { b:6 }, 99 ] } -> 'a.b':5, 'a.b':6 'a.b':null

      Query Matching Semantics

      For a simple query, a request for null will match missing values only if there are no non missing values for the key. But a request for null will always match an explicit null value. For query { 'a.b':null }

      { } (empty document) matches
      { a:[ ] } matches
      { a:[ 88 ] } matches
      { a:[ { b:2 } ] } does not match because there is an existing value of 'a.b'
      { a:[ { b:2 }, { } ] } does not match because there is an existing value of 'a.b', even though there is also a missing 'b' within another array element
      { a:[ { b:2 }, { b:null } ] } does match because there is an explicit null value of 'a.b'.

      However, the $elemMatch operator will restrict matching to individual array elements. If a single array element has a missing 'b', the document will match null. For query { a:{ $elemMatch:{ b:null } } }

      { a:[ { b:2 } ] } does not match because the only value of b is 2
      { a:[ { b:2 }, { } ] } matches because there is a missing value of 'b' in the second array entry (note difference from non elemMatch query example above)
      { a:[ { b:2 }, { b:null } ] } does match because there is an explicit null value of 'a.b'.

      This behavior was requested in SERVER-3377.

      Observed behavior in this ticket:
      When nested fields are missing within some elements of an array, they are indexed as null. One such null value may be used to determine document ordering when a sort is applied.

      Proposed behavior for this ticket:
      A null value representing a missing field will only be used for sorting if the field is missing from all nested objects within an array, unless the user performs an $elemMatch null query of the type described above.

      Aaron

      -----------------------------------------------------------------------------------------------

      The attached test case documents and reproduces the problem, but the tl;dr is that if you have a colleciton with documents like:

      { 'x' : [ { 'subfield': x }, { 'other_subfield': a } ] }
      { 'x' : [ { 'subfield': y }, { 'other_subfield': b } ] }
      { 'x' : [ { 'subfield': z }, { 'other_subfield': c } ] }

      Then trying to sort on x.subfield works in a DESCENDING sort, but fails to order the records on the selected subfield in an ASCENDING sort.

            Assignee:
            david.storch@mongodb.com David Storch
            Reporter:
            andrew.morrow@mongodb.com Andrew Morrow (Inactive)
            Votes:
            3 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated:
              Resolved: