[SERVER-12328] Query: index intersection causes regression when index key set has low cardinality Created: 10/Jan/14  Updated: 11/Jul/16  Resolved: 16/Jan/14

Status: Closed
Project: Core Server
Component/s: Concurrency, Querying
Affects Version/s: 2.5.4
Fix Version/s: 2.5.5

Type: Bug Priority: Major - P3
Reporter: Davide Italiano Assignee: David Storch
Resolution: Done Votes: 0
Labels: 26qa
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Linux ip-10-237-131-70 3.11.0-12-generic #19-Ubuntu SMP Wed Oct 9 16:20:46 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux


Attachments: PNG File 2014-01-09-202506_1280x768_scrot.png     File idxisect_regression.js    
Operating System: ALL
Participants:

 Description   

Queries of the type

find({x: Integer, y:Integer}) 

on documents which schema is

{x: Integer, y: Integer} 

are about 2x slower on 2.6 with respect to 2.4, in case one of the two field key set has low cardinality, index intersection is on, and two indexes are built (one on each field, x and y).
Disabling index intersection makes the problem going away.

Analysis of the problem:
It mostly relies on the couple idx intersection + low cardinality. Imagine the worst case scenario where you have all the documents where one field is fixed and the other is an increasing integer, namely:

{x: 0, y: 0}, {x:1, y: 0}, {x: 2, y:0} ...

and the two aforementioned indexes.
Now, if index intersection is ON, and you do a query of the type

db.goo.find({x: something, y: 0}) 

your query engine will likely scan the whole index for y (and part of the index for x), causing the huge performance drawback. This is somewhat confirmed by experimental results, see attached screenshot + script. There's a point after which index intersection is a clear win – unfortunately I think it's hard to choose such value without statistics/histograms.
Now, I still think that this is somewhat a corner case because people should never build an index on a key with very low cardinality as the one showed in the example, but still, 2.4.8 doesn't exhibits the following behavior so the query optimizer should probably do its best to not pick the idx isect plan in this case.



 Comments   
Comment by Githook User [ 07/Mar/14 ]

Author:

{u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}

Message: SERVER-12328 less aggressive plan cache evictions when the stddev is small
Branch: master
https://github.com/mongodb/mongo/commit/3ac69b3f5337e5e6f0cd36cc71173d07e74fa281

Comment by Githook User [ 16/Jan/14 ]

Author:

{u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}

Message: SERVER-12328 fix bad verifies in cached plan runner
Branch: master
https://github.com/mongodb/mongo/commit/3169e1d6d7eb7306302c423b5710c5ce077ca7be

Comment by Githook User [ 16/Jan/14 ]

Author:

{u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}

Message: SERVER-12328 do not add cache feedback if the runner has been killed
Branch: master
https://github.com/mongodb/mongo/commit/1721db7bddcec2c61c121de961a899622f47c6ec

Comment by Githook User [ 15/Jan/14 ]

Author:

{u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}

Message: SERVER-12328 remove plans from the cache if they are not performant
Branch: master
https://github.com/mongodb/mongo/commit/f211f148068b7acff6a06f5ccaa31feab654fd0f

Generated at Thu Feb 08 03:28:14 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.