[SERVER-14832] Distinct command result set may include or exclude null/undefined depending on presence of index Created: 08/Aug/14  Updated: 24/Jan/24

Status: Needs Scheduling
Project: Core Server
Component/s: Index Maintenance, Querying
Affects Version/s: 2.6.0, 2.6.3
Fix Version/s: None

Type: Bug Priority: Minor - P4
Reporter: Michael Narayan Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 8
Labels: indexv3, open_todo_in_code, query-44-grooming, storch
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-53626 Minimize index scanning when retrievi... Backlog
Duplicate
is duplicated by SERVER-31343 Distinct over empty indexed arrays pr... Closed
is duplicated by SERVER-31838 Query planner generates DISTINCT_SCAN... Closed
is duplicated by SERVER-85229 The same distinct command, the result... Closed
is duplicated by SERVER-17457 Inclusion of "null" in "distinct" res... Closed
Related
related to SERVER-85298 distinct command with a winning plan ... Closed
related to SERVER-12869 Index null values and missing values ... Backlog
Assigned Teams:
Query Optimization
Operating System: ALL
Steps To Reproduce:

db.test.drop();
db.test.insert({ field: [] });
 
// With no index:
db.test.distinct('field');
// =>  [ ]
 
// With an index:
db.test.ensureIndex({ field: 1 });
db.test.distinct('field');
// =>  [ null ]

And now with an additional document:

db.test.drop();
db.test.insert({ field: [] });
db.test.insert({ field: [1, 2] });
 
// With no index:
db.test.dropIndex({ field: 1 });
db.test.distinct('field');
// =>  [ 1, 2 ]
 
// With an index:
db.test.ensureIndex({ field: 1 });
db.test.distinct('field');
// =>  [ null, 1, 2 ]

Participants:

 Description   

When distinct is called on a field for which there exists a document with an empty array in said field, it will return null in the result if there is an index on the field.



 Comments   
Comment by Adrien Jarthon [ 14/May/20 ]

Ah ok, I didn't though this would be possible:

if there is a document with field present and having value undefined

Comment by Asya Kamsky [ 14/May/20 ]

The same problem exists for undefined - if there is a document with field present and having value undefined we would fail to return it, which would be incorrect.

Comment by Adrien Jarthon [ 14/May/20 ]

Well ok for the "null" but wouldn't it work with "undefined" then? (it looks like it's "undefined" that is returned in recent versions of mongo)

Comment by Asya Kamsky [ 14/May/20 ]

bigbourin@gmail.com

> if a field is missing or an array is empty no value should end-up in the distinct result, right?

If the field exists and has the value null we are supposed to return null as part of the result set. This is not currently possible. If we could omit null then your workaround would work.

Comment by Adrien Jarthon [ 13/May/20 ]

Thanks for the confirmation! I'm sure I understand why it would require a new index format though, can't you filter out the null or undefined entries when scanning the index for a distinct query? or even after the fact before returning the response to the client? Because as far as I understand there's no cases where we expect the distinct query to return such values? if a field is missing or an array is empty no value should end-up in the distinct result, right?

Comment by David Percy [ 13/May/20 ]

This is a bug: indexes shouldn't change the meaning of a query. Unfortunately it's not an easy fix, because our current index format has some ambiguities. (It uses null to represent missing fields, and multikey indexes use undefined to represent "zero items" for empty arrays.) We don't want to disable DISTINCT_SCAN plans, so fixing this will require us to introduce a new index format.

Comment by Adrien Jarthon [ 06/May/20 ]

Just so we know, is this a bug or a feature? shall we expect this behavior to change or just patch our code to handle this case?
I guess this might be useful in some cases where you really want to distinguish nil/empty array vs undefined but so far in uses of distinct in our code this was proven to be just an additional burden to clean.

So far I wrote a Mongoid (6.1.1) patch to circumvent this:

module Mongoid
  module Contextual
    class Mongo
      module DistinctWithoutUndefined
        def distinct(field)
          super(field).delete_if { |i| BSON::Undefined === i }
        end
      end
 
      prepend DistinctWithoutUndefined
    end
  end
end

Comment by Robert Weissmann [ 20/Jul/18 ]

Please fix this. Don't get the difficulty. Thanks.

Comment by Michael Smith [ 29/Jan/18 ]

This occurs for string fields, too, in 3.6.2:

db.xxx.insert({a:1,b:"value1"})
db.xxx.insert({a:1,b:"value2"})
db.xxx.insert({a:1})
db.xxx.createIndex({a:1,b:1})
 
> db.xxx.distinct("b")
[ "value1", "value2" ]
 
> db.xxx.distinct("b", {a:1})
[ null, "value1", "value2" ]

In the second query, where an index is used, MongoDB returns a null for the distinct query.

Comment by Tom Grossman [ 10/Oct/16 ]

latest version (3.2) returns `undefined`, the issue still exists.

Comment by Asya Kamsky [ 19/Apr/16 ]

Current master now returns [ undefined ] rather than null when there's an index.

Comment by Max Jacobson [ 04/Jan/16 ]

Hello!

I'm observing exactly the issue reported by Michal Narayan.

This is using MongoDB 3.0.7

Here's some output:

> db.donuts.find()
{ "_id" : ObjectId("568adbd1e187c54cae539e59"), "fruits" : [ "kiwi" ] }
{ "_id" : ObjectId("568adbdee187c54cae539e5a"), "fruits" : [ "banana", "kiwi" ] }
{ "_id" : ObjectId("568adbfae187c54cae539e5c"), "fruits" : [ ] }
> db.donuts.distinct("fruits")
[ null, "banana", "kiwi" ]
> db.donuts.getIndexes()
[
        {
                "v" : 1,
                "key" : {
                        "_id" : 1
                },
                "name" : "_id_",
                "ns" : "lol.donuts"
        },
        {
                "v" : 1,
                "key" : {
                        "fruits" : 1
                },
                "name" : "fruits_1",
                "ns" : "lol.donuts"
        }
]
> db.donuts.dropIndex("fruits_1")
{ "nIndexesWas" : 2, "ok" : 1 }
> db.donuts.distinct("fruits")
[ "kiwi", "banana" ]

Kind of mysterious

Comment by Ramon Fernandez Marina [ 08/Aug/14 ]

Hmmm, I can no longer find the session where I tried 2.7.4, but I can indeed reproduce this behavior in 2.7.4 as well so it must have been pilot error on my side – apologies for the confusion.

Comment by Michael Narayan [ 08/Aug/14 ]

I should probably also add that the null that appears is not a "standard" one (I initially noticed the issue because it was breaking a BSON load in Ruby Moped):

> db.test.drop();
> db.test.insert({ field: [] });
> db.test.insert({ field: [null] });
>
> db.test.distinct('field');
[ null]
>
> db.test.ensureIndex({ field: 1 });
> db.test.distinct('field');
[ null, null ]

Comment by Michael Narayan [ 08/Aug/14 ]

I'm actually still seeing this on 2.7.4:

> db.version()
2.7.4
> db.test.drop();
> db.test.insert({ field: [] });
>
> db.test.distinct('field');
[ ]
>
> db.test.ensureIndex({ field: 1 });
> db.test.distinct('field');
[ null ]

as well as the latest nightly (9d1e5f95a2739a7672b697deb7d1da04ca8c4117).

Comment by Ramon Fernandez Marina [ 08/Aug/14 ]

mnarayan01, thanks for reporting this. We're able to reproduce the behavior you describe in 2.6.3, but the issue has already been fixed in the latest development version (2.7.4).

Regards,
Ramón.

Generated at Thu Feb 08 03:36:07 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.