[SERVER-13298] distinct doesn't use index to distinct multikey dotted fields Created: 20/Mar/14  Updated: 29/Jan/20  Resolved: 04/Apr/19

Status: Closed
Project: Core Server
Component/s: Index Maintenance
Affects Version/s: 2.4.10, 2.6.3, 3.0.14, 3.2.10, 3.5.13
Fix Version/s: 4.1.10

Type: Bug Priority: Major - P3
Reporter: Asya Kamsky Assignee: Ian Boros
Resolution: Done Votes: 17
Labels: QFB, asya, storch
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
depends on SERVER-12869 Index null values and missing values ... Backlog
Duplicate
is duplicated by SERVER-9640 Distinct command does not use coverin... Closed
is duplicated by SERVER-20989 Distinct not uses indexes Closed
is duplicated by SERVER-27060 Distinct not using the index on an in... Closed
Related
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.0, v3.6
Sprint: Query 2019-03-11, Query 2019-03-25, Query 2019-04-08
Participants:
Case:

 Description   

Had a compound index on

{ "lineitems.returnflag":1, "lineitems.linestatus":1 }


where lineitems is an array.

Did db.orders.distinct("lineitems.returnflag") which caused a collection scan.

Non-multikey correctly used the index.



 Comments   
Comment by Githook User [ 04/Apr/19 ]

Author:

{'name': 'Ian Boros', 'username': 'puppyofkosh', 'email': 'puppyofkosh@gmail.com'}

Message: SERVER-13298 distinct now uses index for multikey dotted fields
Branch: master
https://github.com/mongodb/mongo/commit/eda597a9970ae8f1f7a4c4ede0c0f5802c612d35

Comment by Danny Couture [ 17/Nov/16 ]

important issue for us... preventing us from doing combobox for filtering

Not sure if you keep "Affects Version/s" field up to date... but we reported this bug on 3.2.10...

Really looking forward for the fix too!
Thanks.

Comment by Konstantin Bodnia [ 20/Oct/15 ]

Ok, here's the case.

We're having quite huge collection for healthcare insurance products with really huge amount of fields. For better indexing and nicer performance I created an object `_indexed` with like 30 subfields for different coverages.

  • "_indexed.fysiotherapie_vanaf_18_jaar-max_aantal_behandelingen_p_jr"
  • "_indexed.preventieve_mondzorg-max_bedrag_p_jr"

etc...

And they're arrays. I was hoping to get the values out of them to dynamically updated filters upon further and further filtering in our comparison tool. But ended up here. So for my case it would be much of a use.

I could put them all into plain fields to avoid this problem, but it would be hell in projection and other things...

Really looking forward for the fix!

Thanks.

Comment by David Storch [ 01/Jul/14 ]

Per discussion with hari.khalsa@10gen.com, this is not ready for commit just yet. The distinct optimization uses a special index access stage which returns the distinct index keys to its parent stage. In the multikey dotted case, however, the distinct stage would have to check for null or undefined keys. In the case of null or undefined, it must fetch the full document in order to disambiguate between literal nulls versus null by virtue of missing fields. We have decided to hold off unless we see that users really need this.

Comment by hari.khalsa@10gen.com [ 25/Mar/14 ]

I mocked up a fix that would allow the embedded multikey index to be used. It would imply a change in behavior. For the data:

t.save( { a : [{b:"a"}, {b:"d"}] , c : 12 } );
t.save( { a : [{b:"b"}, {b:"d"}] , c : 12 } );
t.save( { a : [{b:"c"}, {b:"e"}] , c : 12 } );
t.save( { a : [{b:"c"}, {b:"f"}] , c : 12 } );
t.save( { a : [] , c : 12 } );
t.save( { a : { b : "z"} , c : 12 } );

We expect this behavior currently:

// With index.
t.ensureIndex( { "a.b" : 1 } );
res = t.distinct( "a.b" );
res.sort()
assert.eq( "a,b,c,d,e,f,z" , res.toString() , "B4" );

But if we're using the index we'll have some "no such key found" entries appear in the output that we previously didn't consider part of the value set for distinct.

Generated at Thu Feb 08 03:31:16 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.