[SERVER-25111] For index scans, maxScan is returning one less than the parameter Created: 15/Jul/16  Updated: 06/Dec/22  Resolved: 19/Aug/16

Status: Closed
Project: Core Server
Component/s: Querying, Shell
Affects Version/s: 3.3.5
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: George Thompson Assignee: Backlog - Query Team (Inactive)
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Windows 10
MongoDb 3.3.5


Assigned Teams:
Query
Operating System: ALL
Steps To Reproduce:

In MongoDB 3.3.5 shell using the WiredTiger storage engine:

use TESTMAXSCAN
 
db.users.insert({name:"sue",age:26,status:"A"})
db.users.insert({name:"bob",age:42,status:"B"})
db.users.insert({name:"mark",age:29,status:"B"})
 
db.users.find()
{ "_id" : ObjectId("57891a9c52fec6668177681b"), "name" : "sue", "age" : 26, "status" : "A" }
{ "_id" : ObjectId("57891aad52fec6668177681c"), "name" : "bob", "age" : 42, "status" : "B" }
{ "_id" : ObjectId("57891abc52fec6668177681d"), "name" : "mark", "age" : 29, "status" : "B" }
 
// This is good (a document scan):
db.users.find().maxScan(2)
{ "_id" : ObjectId("57891a9c52fec6668177681b"), "name" : "sue", "age" : 26, "status" : "A" }
{ "_id" : ObjectId("57891aad52fec6668177681c"), "name" : "bob", "age" : 42, "status" : "B" }
 
db.users.createIndex({name:1})
 
// This is where I am confused with an index scan -- why only one?
db.users.find().maxScan(2).sort({name:1})
{ "_id" : ObjectId("57891aad52fec6668177681c"), "name" : "bob", "age" : 42, "status" : "B" }

In MongoDB 2.2.0 shell using the legacy database:

 
use TESTMAXSCAN
 
db.users.insert({name:"sue",age:26,status:"A"})
db.users.insert({name:"bob",age:42,status:"B"})
db.users.insert({name:"mark",age:29,status:"B"})
 
db.users.find()
{ "_id" : ObjectId("57891a9c52fec6668177681b"), "name" : "sue", "age" : 26, "status" : "A" }
{ "_id" : ObjectId("57891aad52fec6668177681c"), "name" : "bob", "age" : 42, "status" : "B" }
{ "_id" : ObjectId("57891abc52fec6668177681d"), "name" : "mark", "age" : 29, "status" : "B" }
 
db.users.find({$query:{},$maxScan:2})
{ "_id" : ObjectId("57891c05adc58efb84db213f"), "name" : "sue", "age" : 26, "status" : "A" }
{ "_id" : ObjectId("57891c06adc58efb84db2140"), "name" : "bob", "age" : 42, "status" : "B" }
 
db.users.createIndex({name:1})
 
// Notice it prints two documents
db.users.find({$query:{},$maxScan:2,$orderby:{name:1}})
{ "_id" : ObjectId("57891c06adc58efb84db2140"), "name" : "bob", "age" : 42, "status" : "B" }
{ "_id" : ObjectId("57891c09adc58efb84db2141"), "name" : "mark", "age" : 29, "status" : "B" }

Participants:

 Description   

When using maxScan(n) with a sort(), it only returns n-1 documents. When I use $maxScan with MongoDB 2.2.0 with the legacy storage engine, it returns n documents.

Is this a bug or a change in maxScan that I don't understand?



 Comments   
Comment by Ian Whalen (Inactive) [ 19/Aug/16 ]

Since applications shouldn't depend on maxScan having a strong contract about the number of results that get returned, we've elected not to fix this issue.

Comment by David Storch [ 18/Jul/16 ]

therefore, you're welcome! Glad to hear that max + limit + skip will work now. Separately, our team will triage this ticket regarding the proposed fix to maxScan.

Comment by George Thompson [ 18/Jul/16 ]

Thanks david.storch. We are doing exactly that in our code and the explain() confirms that it is scanning only two index keys. We started this project with mongoDB r2.0.7 and, according to our notes, when using $min with a limit (nToReturn = 1) and a skip (nToSkip=1) an unbounded index scan occurred, solved by using $maxScan. Could be the analyst was wrong. Happens.

I'm happy maxScan is not necessary!

Thanks for all of your help and guidance.

Comment by David Storch [ 18/Jul/16 ]

Hi therefore,

My question: Are you saying that using $min without a $max (or a $max without a $min) will not be an unbounded index scan if I use limit()?

Correct, you should be able to get the behavior you want using .max() with a descending sort and .limit(). For example,

> db.c.drop()
> db.c.insert({a: 1, b: 1})
> db.c.insert({a: 1, b: 2})
> db.c.insert({a: 2, b: 1})
> db.c.insert({a: 2, b: 2})
> db.c.createIndex({a: 1, b: 1})
> db.c.find().max({a: 2, b: 1}).sort({a: -1, b: -1}).skip(1).limit(1)
{ "_id" : ObjectId("578ce61471fccfc5175a24c0"), "a" : 1, "b" : 2 }

The bug that you had run into in 2013 was fixed in SERVER-15015. All stable versions greater than or equal to 2.6.5 contain the fix. So, I believe it is still the case that it would be better for your application to use limit than maxScan (though there was a lot of context in the linked ticket and in its related SERVER tickets, so please let me know if I'm missing something).

Best,
Dave

Comment by George Thompson [ 16/Jul/16 ]

FWIW, here is the code we use to get to the "next" and "previous" document (without and with maxScan).

Previously in the code we created

* fields_to_return_builder for projection, 
* hint_builder (e.g., {employeenum:1, check:1}),  
* sort_builder ({employeenum:1, check:1} if "next", {employeenum:-1, check:-1} if "previous") and 
* current_key_values_builder ({employeenum:2, check:"A"}).
 
position_offset = 0 to display the current document in the collection.
                  1 to display the next document
                 -1 to display the previous document
 
mongocxx::client conn{ mongocxx::uri{} };
auto db = conn[current_file_blueprint->application_id()];
 
mongocxx::options::find find_options;
 
find_options.projection(fields_to_return_builder.view()); // { _id: 0, employeenum: 1, check: 1, address: 1}
			
find_options.limit(1);
find_options.skip(std::abs(position_offset));	// skip = 0 (display current document) or 1 (display next or previous document)
			
bsoncxx::builder::stream::document find_modifiers;
 
if (position_offset >= 0)
    find_modifiers << "$min" << open_document << concatenate(current_key_values_builder.view()) << close_document; // $min: { employeenum: 2, check: "A" }
else
    find_modifiers << "$max" << open_document << concatenate(current_key_values_builder.view()) << close_document; // $max: { employeenum: 2, check: "A" }
 
find_options.modifiers(find_modifiers.view());
mongocxx::hint hint(hint_builder.view());
find_options.hint(hint);    // { employeenum:  1, check:  1 }
find_options.sort(sort_builder.view());	// "next" { employeenum:  1, check:  1 } or "previous" { employeenum:  -1, check:  -1 } 
 
mongocxx::cursor document_to_display = db[current_file_blueprint->id()].find({},find_options);

Presuming that $maxScan is required to limit the index scan, the original code had:

if (position_offset >= 0)
    find_modifiers << "$min" << open_document << concatenate(current_key_values_builder.view()) << close_document << "$maxScan" << 2;
else
    find_modifiers << "$max" << open_document << concatenate(current_key_values_builder.view()) << close_document << "$maxScan" << 2;

which would have to be changed to this to handle this bug:

if (position_offset >= 0)
    find_modifiers << "$min" << open_document << concatenate(current_key_values_builder.view()) << close_document << "$maxScan" << 3;
else
    find_modifiers << "$max" << open_document << concatenate(current_key_values_builder.view()) << close_document << "$maxScan" << 3;

Comment by George Thompson [ 16/Jul/16 ]

Hi david.storch

Thank you for your analysis. According to our notes (from back in 2013!), we used maxScan because in our C++ queries we used $min with std::auto_ptr< DBClientCursor > mongo::DBClientConnection::query setting int nToReturn = 1:

Want to constrain the query from searching at the $min point to the end of the collection which according to explain() it does if I use limit(1). It needs to scan twice (guessing) because it scans the current index value and then needs to only get one more.

So, my rewrite of the code continues to use

bsoncxx::builder::stream::document find_modifiers;
if (position_offset >= 0)
	find_modifiers << "$min" << open_document << concatenate(current_key_values_builder.view()) << close_document;	// $min: { employeenum: 2, check: "A" }
else
	find_modifiers << "$max" << open_document << concatenate(current_key_values_builder.view()) << close_document;	// $max: { employeenum: 2, check: "A" }
find_options.modifiers(find_modifiers.view());

depending if we are traversing the index forward or backwards.

My question: Are you saying that using $min without a $max (or a $max without a $min) will not be an unbounded index scan if I use limit()? If not, then I would argue the value of maxScan stands. I guess I could workaround the problems by simply changing maxScan from 2 to 3 and then keep an eye on this bug's progress but that seems kludgy.

My use case: https://jira.mongodb.org/browse/SERVER-9540 which at the time required "a second index traversing the documents in the opposite direction. This has only a very small impact on insert speed, but it does double the space requirements for the index." source Luckily your query rewrite has fixed this problem (which caused the delay of our project since that was not considered acceptable). https://jira.mongodb.org/browse/SERVER-9547

Comment by David Storch [ 15/Jul/16 ]

Hi therefore,

Thanks for the report. The MongoDB query execution engine was entirely rewritten for version 2.6, and it looks like you have stumbled upon a behavior change between the old implementation of the query engine and the current one. The current implementation of the maxScan option in the IXSCAN stage stops returning query results as soon as the index scan has examined maxScan keys:

https://github.com/mongodb/mongo/blob/master/src/mongo/db/exec/index_scan.cpp#L166-L169

Specifically, the following happens when you run your repro script. The index scan looks at the first key, increments keysExamined, and then returns the key. When it looks at the second key, it increments keysExamined, but before returning the key, it notices that keysExamined is equal to maxScan. At this point, it ends the scan and stops returning further query results, without ever returning the second key. Arguably, this is incorrect, and we should fix it by modifying the code to look like this:

        if (_params.maxScan && _specificStats.keysExamined >= _params.maxScan) {
            kv = boost::none;
        }
        ++_specificStats.keysExamined;

In other words, we should only increment keysExamined after making the maxScan check.

That said, maxScan makes no guarantees about how many query results should be returned to the application. If you wish to restrict the result set to a particular size, the correct option to use is cursor.limit(). The intended use case for maxScan is to protect your database against runaway slow queries which perform large index scans (if, for example, your query patterns are unpredictable and you can't ensure that every query will be well indexed).

I am going to move this ticket into a Needs Triage state so that it can be evaluated by the development team. In the meantime, I highly recommend that, if possible, you migrate your application to use limit rather than maxScan.

Best,
Dave

Comment by George Thompson [ 15/Jul/16 ]

Thanks. We came across this problem while rewriting our C++ code to the new standard and using "$maxScan" << 2 with bsoncxx::builder::stream::document find_modifiers. We use this extensively in our application.

Comment by Ramon Fernandez Marina [ 15/Jul/16 ]

Thanks for your report therefore, I'm able to reproduce the behavior you describe from 2.6 to 3.2 as well, so we're investigating.

Generated at Thu Feb 08 04:08:17 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.