[SERVER-5065] a missing field nested within an array is sorted as 'null', even if the same field exists inside another array element Created: 24/Feb/12  Updated: 17/Aug/17  Resolved: 15/Aug/17

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: 2.0.2, 2.1.0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Andrew Morrow (Inactive) Assignee: David Storch
Resolution: Done Votes: 3
Labels: query_triage
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File mongo_query_order.py    
Issue Links:
Depends
Related
related to SERVER-24310 Adding NULL sort order configuration ... Backlog
is related to SERVER-18861 Queries matching null value should be... Backlog
is related to SERVER-19402 Change semantics of sorting by array ... Closed
Operating System: ALL
Participants:

 Description   

Background

Sorting

If there are multiple values for a field, only one of them is used to sort a document. For example when sorting the document { a:[ 1, 3, 2 ] } using sort spec { a:1 } only one of the values of 'a' is used to order the document. The value of 'a' chosen for sorting is the first that would be encountered while traversing an index on the sort spec (in this case { a:1 }) to resolve the query. For example if the query is {}, the first value of 'a' encountered on index { a:1 } is 1. If the query is { a:{ $gte:2 } } the first value is 2. Note that this policy is implemented regardless of whether there is actually an index { a:1 } or if sorting is performed using the { a:1 } ordering in memory, without an index. (The results must be the same regardless of whether or not an index is used.)

Consider the case of document { a:[ 1, null, 2 ] }. In this example, 2 is the largest value of 'a'. It is the first value of 'a' that would be encountered for index { a:-1 } and is the value used for sorting. The smallest value of 'a' is 'null'. It is the first value of 'a' that would be encountered for index { a:1 } and is the one used for sorting. Furthermore, given two documents { _id:'x', a:[ 5, null ] } and { _id:'y', a:[ 2, 3 ] }, if sorting is performed according to order { a:1 } the documents will be ordered with document x followed by document y because null is less than 2.

This behavior was requested in SERVER-480.

Index Key Extraction

In many cases, indexing a nested field within an array uses an existing value from the document. Eg for index { 'a.b':1 }

document { a:[ { b:5 } ] } -> produces index key 'a.b':5
{ a:[ { b:5 }, { b:6 } ] } -> two keys 'a.b':5, 'a.b':6

If the nested field is missing however, a null value is stored in the index

document { } (no 'a' field present) -> produces index key 'a.b':null
{ a:[ ] } -> 'a.b':null
{ a:[ {} ] } -> 'a.b':null
{ a:[ { x:1 } ] } -> 'a.b':null
{ a:[ 7 ] } -> 'a.b':null

In the case where some array values have an 'a.b' field and some do not, a mixture of null and non null index keys is produced:

{ a:[ { b:5 }, {} ] } -> 'a.b':5, 'a.b':null
{ a:[ { b:5 }, { b:6 }, 99 ] } -> 'a.b':5, 'a.b':6 'a.b':null

Query Matching Semantics

For a simple query, a request for null will match missing values only if there are no non missing values for the key. But a request for null will always match an explicit null value. For query { 'a.b':null }

{ } (empty document) matches
{ a:[ ] } matches
{ a:[ 88 ] } matches
{ a:[ { b:2 } ] } does not match because there is an existing value of 'a.b'
{ a:[ { b:2 }, { } ] } does not match because there is an existing value of 'a.b', even though there is also a missing 'b' within another array element
{ a:[ { b:2 }, { b:null } ] } does match because there is an explicit null value of 'a.b'.

However, the $elemMatch operator will restrict matching to individual array elements. If a single array element has a missing 'b', the document will match null. For query { a:{ $elemMatch:{ b:null } } }

{ a:[ { b:2 } ] } does not match because the only value of b is 2
{ a:[ { b:2 }, { } ] } matches because there is a missing value of 'b' in the second array entry (note difference from non elemMatch query example above)
{ a:[ { b:2 }, { b:null } ] } does match because there is an explicit null value of 'a.b'.

This behavior was requested in SERVER-3377.

Observed behavior in this ticket:
When nested fields are missing within some elements of an array, they are indexed as null. One such null value may be used to determine document ordering when a sort is applied.

Proposed behavior for this ticket:
A null value representing a missing field will only be used for sorting if the field is missing from all nested objects within an array, unless the user performs an $elemMatch null query of the type described above.

Aaron

-----------------------------------------------------------------------------------------------

The attached test case documents and reproduces the problem, but the tl;dr is that if you have a colleciton with documents like:

{ 'x' : [ { 'subfield': x }, { 'other_subfield': a } ] }
{ 'x' : [ { 'subfield': y }, { 'other_subfield': b } ] }
{ 'x' : [ { 'subfield': z }, { 'other_subfield': c } ] }

Then trying to sort on x.subfield works in a DESCENDING sort, but fails to order the records on the selected subfield in an ASCENDING sort.



 Comments   
Comment by David Storch [ 15/Aug/17 ]

Hi all,

We have settled on consistent array sort behavior for both the find and aggregate commands under SERVER-19402. This issue report is consistent with that behavior, so I am closing this ticket as Works as Designed.

The undesirable behavior here has more to do with our indexing semantics for null than with sorting. SERVER-19402 specifies that sort key extraction is done using the same semantics as index key extraction. Therefore, if nested documents with a missing value generate a null index key, then the sorting code will also generate a null sort key. Arguably we should change index key generation to avoid this problem. This, of course, would require an index version bump. Furthermore, index key generation semantics have implications for matching semantics, and for whether or not we can generate covered plans which match on null: see SERVER-18861.

Please feel free to reach out here with any followup questions.

Best,
Dave

Comment by Ron Avnur [ 15/Feb/13 ]

aaron, proposed behavior in description sounds reasonable to me.

Comment by Aaron Staple [ 24/Oct/12 ]

As a workaround you can use the query portion of your find to ensure that the sort field does not match null, eg using original example from the description:

c = db.c;
c.drop();
 
c.save( { 'x' : [ { 'subfield': 'x' }, { 'other_subfield': 'a' } ] } );
c.save( { 'x' : [ { 'subfield': 'z' }, { 'other_subfield': 'c' } ] } );
c.save( { 'x' : [ { 'subfield': 'y' }, { 'other_subfield': 'b' } ] } );
 
// Sorts using implicit nulls.                                                                         
printjson( c.find().sort( { 'x.subfield':1 } ).toArray() );
// Does not sort using implicit nulls.                                                                 
printjson( c.find( { 'x.subfield':{ $ne:null } } ).sort( { 'x.subfield':1 } ).toArray() );

Comment by Wei Kong [ 21/Aug/12 ]

This is affecting our production deployment. We provide a cloud services to app developers and the sort is currently broken for all developers and their apps. Please bump up the priority if you can. Thanks!

Comment by Andrew Morrow (Inactive) [ 27/Feb/12 ]

Hi Aaron -

I just noticed that the 'Affects Versions' field was changed from 2.0.2 to 2.1.0 for this ticket. I definitely experience this problem with 2.0.2, and having the 'affects' set to 2.1 makes it look as if this issue only affects the unstable version. Does setting it to 2.1 mean that this will not be fixed in the stable/2.0 release series?

Comment by Aaron Staple [ 24/Feb/12 ]

Haven't fully checked, but I think the issue is that the

{'other_subfield':a}

sub document is creating an invisible null key for 'x.subfield' (see SERVER-3377). When we sort in one direction, we see a real value before this null value, but when we sort in the opposite direction we see
null as the first 'events.create.event_time' key for each of these
documents and so they are all considered equal for sorting purposes.

> c = db.c
test.c
> c.save( {a:[{b:1}]} )
> c.save( {a:[{b:3}]} )
> c.save( {a:[{b:2}]} )
> c.find().sort( {'a.b':1} )
{ "_id" : ObjectId("4f47c71c02fa1356109b9089"), "a" : [ { "b" : 1 } ] }
{ "_id" : ObjectId("4f47c71f02fa1356109b908b"), "a" : [ { "b" : 2 } ] }
{ "_id" : ObjectId("4f47c71e02fa1356109b908a"), "a" : [ { "b" : 3 } ] }
> c.find().sort( {'a.b':-1} )
{ "_id" : ObjectId("4f47c71e02fa1356109b908a"), "a" : [ { "b" : 3 } ] }
{ "_id" : ObjectId("4f47c71f02fa1356109b908b"), "a" : [ { "b" : 2 } ] }
{ "_id" : ObjectId("4f47c71c02fa1356109b9089"), "a" : [ { "b" : 1 } ] }
> c.remove()
> c.save( {a:[{b:1},{}]} )
> c.save( {a:[{b:3},{}]} )
> c.save( {a:[{b:2},{}]} )
> c.find().sort( {'a.b':1} )
{ "_id" : ObjectId("4f47c74602fa1356109b908c"), "a" : [ { "b" : 1 }, { } ] }
{ "_id" : ObjectId("4f47c74c02fa1356109b908d"), "a" : [ { "b" : 3 }, { } ] }
{ "_id" : ObjectId("4f47c75102fa1356109b908e"), "a" : [ { "b" : 2 }, { } ] }
> c.find().sort( {'a.b':-1} )
{ "_id" : ObjectId("4f47c74c02fa1356109b908d"), "a" : [ { "b" : 3 }, { } ] }
{ "_id" : ObjectId("4f47c75102fa1356109b908e"), "a" : [ { "b" : 2 }, { } ] }
{ "_id" : ObjectId("4f47c74602fa1356109b908c"), "a" : [ { "b" : 1 }, { } ] }
> c.remove()
> c.save( {a:[{b:1},{c:1}]} )
> c.save( {a:[{b:3},{c:1}]} )
> c.save( {a:[{b:2},{c:1}]} )
> c.find().sort( {'a.b':1} )
{ "_id" : ObjectId("4f47c8eb02fa1356109b908f"), "a" : [ { "b" : 1 }, { "c" : 1 } ] }
{ "_id" : ObjectId("4f47c8f802fa1356109b9090"), "a" : [ { "b" : 3 }, { "c" : 1 } ] }
{ "_id" : ObjectId("4f47c8fd02fa1356109b9091"), "a" : [ { "b" : 2 }, { "c" : 1 } ] }
> c.find().sort( {'a.b':-1} )
{ "_id" : ObjectId("4f47c8f802fa1356109b9090"), "a" : [ { "b" : 3 }, { "c" : 1 } ] }
{ "_id" : ObjectId("4f47c8fd02fa1356109b9091"), "a" : [ { "b" : 2 }, { "c" : 1 } ] }
{ "_id" : ObjectId("4f47c8eb02fa1356109b908f"), "a" : [ { "b" : 1 }, { "c" : 1 } ] }
> 

Generated at Thu Feb 08 03:07:48 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.