[SERVER-13899] "Whole index scan" query solutions can use incompatible indexes, return incorrect results Created: 10/May/14  Updated: 15/Nov/21  Resolved: 13/May/14

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: 2.6.1, 2.7.0
Fix Version/s: 2.6.2, 2.7.1

Type: Bug Priority: Major - P3
Reporter: Andrew Zammit Assignee: J Rassi
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

version 2.6.0
gitVersion 1c1c76aeca21c5983dc178920f5052c298db616c
targetMinOS Windows 7/Windows Server 2008 R2
OpenSSLVersion
sysInfo windows sys.getwindowsversion(major=6, minor=1, build=7601, platform=2, service_pack='Service Pack 1') BOOST_LIB_VERSION=1_49
loaderFlags /nologo /DEBUG /INCREMENTAL:NO /LARGEADDRESSAWARE
compilerFlags /TP /nologo /EHsc /W3 /wd4355 /wd4800 /wd4267 /wd4244 /wd4290 /we4099 /Z7 /errorReport:none /MT /O2 /Oy-
allocator system
versionArray [2,6,0,0]
javascriptEngine V8
bits 64
debug FALSE
maxBsonObjectSize 16777216


Issue Links:
Depends
Duplicate
is duplicated by SERVER-14369 Bug when sorting on a date attribute ... Closed
Related
related to SERVER-10801 Allow compound geo indexes to be able... Backlog
Backport Completed:
Steps To Reproduce:

use test;
db.sorthashed.insert([
	{str:'bravo'},
	{str:'alpha'},
	{str:'tango'},
	{str:'charlie'}
]);
db.sorthashed.ensureIndex({str:'hashed'},{background:false});
db.sorthashed.find({}).sort({str:1}).pretty();

Sprint: Server 2.7.1
Participants:

 Description   
Issue Status as of May 16, 2014

ISSUE SUMMARY
When the query planner is picking an index for a whole index scan solution (to provide a sort), it considers the index's sparseness flag but doesn't consider whether or not the index is of a special index type. Thus the wrong index may be used and results won't be what the user expects. This is a regression from 2.4.

USER IMPACT
Incorrectly using a hashed index for a sort will generate results out of order. Incorrectly using a text or geo index for a sort will similarly generate results out of order, and additionally can cause results to be missing from the result set.

WORKAROUNDS
On a query-by-query basis, users may work around this issue by employing the hint() feature or index filter feature to force the query planner to choose a non-offending index. There is no workaround that can be employed server-wide.

AFFECTED VERSIONS
MongoDB production versions 2.6.0 and 2.6.1 are affected by this issue.

FIX VERSION
The fix is included in the 2.6.2 production release.

RESOLUTION DETAILS
Use indexes of a special index type to provide a sort only when the query predicate can be used to guarantee the sort order and that all documents to be returned are indexed.

Original description

If a particular field is defined as a hashed index, when sorting the result set is not in the expected order. When using a non-hashed (normal) index or no index at all on the specific String field, the result set's order is correct when sorting. This can be reproduced in the MongoDB shell, Node.js driver, and PHP driver, so I'm assuming this is a problem with the core server.

The steps to reproduce show only how to reproduce the problem, but the following code illustrates the effect:

use test;
 
db.sorthashed.insert([
	{str:'bravo'},
	{str:'alpha'},
	{str:'tango'},
	{str:'charlie'}
]); // intentionally inserted out of alpha order, field in question is `str`
db.sorthashed.find({}).sort({str:1}).pretty(); // shows in order
 
db.sorthashed.ensureIndex({str:'hashed'},{background:false});
db.sorthashed.find({}).sort({str:1}).pretty(); // shows out of order
 
db.sorthashed.dropIndex('str_hashed');
db.sorthashed.find({}).sort({str:1}).pretty(); // shows in order
 
db.sorthashed.ensureIndex({str:-1},{background:false}); // using desc index order as an example
db.sorthashed.find({}).sort({str:1}).pretty(); // shows in order



 Comments   
Comment by Githook User [ 17/May/14 ]

Author:

{u'username': u'jrassi', u'name': u'Jason Rassi', u'email': u'rassi@10gen.com'}

Message: SERVER-13899 Remove unused stripInvalidAssignmentsTo2dsphereIndex arg
Branch: master
https://github.com/mongodb/mongo/commit/3f1255ba06d7ce45d0ad748d7f8a957cd7207d94

Comment by Githook User [ 16/May/14 ]

Author:

{u'username': u'jrassi', u'name': u'Jason Rassi', u'email': u'rassi@10gen.com'}

Message: SERVER-13899 Forbid whole index scan on plugin index to provide sort
(cherry picked from commit a6a0c243b6cd6a5d45c876ab100a21073c070a00)
Branch: v2.6
https://github.com/mongodb/mongo/commit/2cebdd054c5f1686179e5a2db1ea46adf6670f37

Comment by J Rassi [ 13/May/14 ]

would the a:1 index be utilized since a:'hashed' wouldn't be considered in the query plan?

Yes. All indexes are considered, and only some are eligible.

Comment by Andrew Zammit [ 13/May/14 ]

Jason, That makes a bit more sense. Thanks for spelling it out.

What if I had two indices? For example: a:1 and a:'hashed', although I'm not sure of the implications of this in practice. When using

find({b:1}).sort({a:1})

would the a:1 index be utilized since a:'hashed' wouldn't be considered in the query plan?

Comment by J Rassi [ 13/May/14 ]

The fix is not quite as you describe. It affects one particular type of query plan: one that scans an entire index in order to avoid an in-memory sort. Consider how the index {a:1} could be used to help accelerate the query find({b:1}).sort({a:1}); a "whole index scan" on this index can provide the sort order for the query (and a filter is employed to discard results that don't match the query predicate). In practice this is often not a useful query plan, but it does work well if many documents match the query predicate and the user requests a limit.

The only reason that this query plan is viable is that the index order for the {a:1} index exactly matches the sort order {a:1} requested. Since geo indexes, hashed indexes and text indexes do not store documents in this order, they can't be considered for this query plan (compound indexes like {a:1, b: "2dsphere"} do actually store documents in this order, but because these indexes are sparse on geo fields, additional logic is required to guarantee that these index scans won't miss results; see comment at query_planner.cpp:833).

Comment by Andrew Zammit [ 13/May/14 ]

Thanks Jason for hopping on this so quickly.

To confirm, the solution is to not use the index to sort unless it is a regular index (i.e. a:1 is regular, but a:'hashed' is irregular). Wouldn't performance degrade if it is not utilizing the index? You're the expert and perhaps I don't fully understand why text, 2dsphere, hashed, etc. can't be used, but I'm not convinced this is optimal solution.

Comment by Githook User [ 13/May/14 ]

Author:

{u'username': u'jrassi', u'name': u'Jason Rassi', u'email': u'rassi@10gen.com'}

Message: SERVER-13899 Forbid whole index scan on plugin index to provide sort
Branch: master
https://github.com/mongodb/mongo/commit/a6a0c243b6cd6a5d45c876ab100a21073c070a00

Comment by J Rassi [ 10/May/14 ]

Andrew, thanks for reporting this issue.

When the query planner is picking an index for a whole index scan solution (to provide a sort), it considers the index's sparseness flag but doesn't consider whether or not the index is of a special index type. Confirmed as a regression since 2.4.

Incorrectly using a hashed index for a sort will generate results out of order (per repro above). Incorrectly using a text or geo index for a sort will similarly generate results out of order, and additionally can cause results to be missing from the result set:

> db.foo.insert({a:1})
WriteResult({ "nInserted" : 1 })
> db.foo.ensureIndex({a:1,b:"text"})
{
	"createdCollectionAutomatically" : false,
	"numIndexesBefore" : 1,
	"numIndexesAfter" : 2,
	"ok" : 1
}
> db.foo.find().itcount()
1
> db.foo.find().sort({a:1}).itcount()
0

> db.foo.insert({a:null})
WriteResult({ "nInserted" : 1 })
> db.foo.ensureIndex({a:"2dsphere"})
{
	"createdCollectionAutomatically" : false,
	"numIndexesBefore" : 1,
	"numIndexesAfter" : 2,
	"ok" : 1
}
> db.foo.find().itcount()
1
> db.foo.find().sort({a:1}).itcount()
0

Generated at Thu Feb 08 03:33:15 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.