[SERVER-25111] For index scans, maxScan is returning one less than the parameter Created: 15/Jul/16 Updated: 06/Dec/22 Resolved: 19/Aug/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Querying, Shell |
| Affects Version/s: | 3.3.5 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | George Thompson | Assignee: | Backlog - Query Team (Inactive) |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Windows 10 |
||
| Assigned Teams: |
Query
|
|||||||||||||||||||||||||||||||||||||||||||
| Operating System: | ALL | |||||||||||||||||||||||||||||||||||||||||||
| Steps To Reproduce: | In MongoDB 3.3.5 shell using the WiredTiger storage engine:
In MongoDB 2.2.0 shell using the legacy database:
|
|||||||||||||||||||||||||||||||||||||||||||
| Participants: |
| Description |
|
When using maxScan(n) with a sort(), it only returns n-1 documents. When I use $maxScan with MongoDB 2.2.0 with the legacy storage engine, it returns n documents. Is this a bug or a change in maxScan that I don't understand? |
| Comments |
| Comment by Ian Whalen (Inactive) [ 19/Aug/16 ] | ||||||||||||||||||||||||||||||||||||||||
|
Since applications shouldn't depend on maxScan having a strong contract about the number of results that get returned, we've elected not to fix this issue. | ||||||||||||||||||||||||||||||||||||||||
| Comment by David Storch [ 18/Jul/16 ] | ||||||||||||||||||||||||||||||||||||||||
|
therefore, you're welcome! Glad to hear that max + limit + skip will work now. Separately, our team will triage this ticket regarding the proposed fix to maxScan. | ||||||||||||||||||||||||||||||||||||||||
| Comment by George Thompson [ 18/Jul/16 ] | ||||||||||||||||||||||||||||||||||||||||
|
Thanks david.storch. We are doing exactly that in our code and the explain() confirms that it is scanning only two index keys. We started this project with mongoDB r2.0.7 and, according to our notes, when using $min with a limit (nToReturn = 1) and a skip (nToSkip=1) an unbounded index scan occurred, solved by using $maxScan. Could be the analyst was wrong. Happens. I'm happy maxScan is not necessary! Thanks for all of your help and guidance. | ||||||||||||||||||||||||||||||||||||||||
| Comment by David Storch [ 18/Jul/16 ] | ||||||||||||||||||||||||||||||||||||||||
|
Hi therefore,
Correct, you should be able to get the behavior you want using .max() with a descending sort and .limit(). For example,
The bug that you had run into in 2013 was fixed in Best, | ||||||||||||||||||||||||||||||||||||||||
| Comment by George Thompson [ 16/Jul/16 ] | ||||||||||||||||||||||||||||||||||||||||
|
FWIW, here is the code we use to get to the "next" and "previous" document (without and with maxScan). Previously in the code we created
Presuming that $maxScan is required to limit the index scan, the original code had:
which would have to be changed to this to handle this bug:
| ||||||||||||||||||||||||||||||||||||||||
| Comment by George Thompson [ 16/Jul/16 ] | ||||||||||||||||||||||||||||||||||||||||
|
Hi david.storch Thank you for your analysis. According to our notes (from back in 2013!), we used maxScan because in our C++ queries we used $min with std::auto_ptr< DBClientCursor > mongo::DBClientConnection::query setting int nToReturn = 1:
So, my rewrite of the code continues to use
depending if we are traversing the index forward or backwards. My question: Are you saying that using $min without a $max (or a $max without a $min) will not be an unbounded index scan if I use limit()? If not, then I would argue the value of maxScan stands. I guess I could workaround the problems by simply changing maxScan from 2 to 3 and then keep an eye on this bug's progress but that seems kludgy. My use case: https://jira.mongodb.org/browse/SERVER-9540 which at the time required "a second index traversing the documents in the opposite direction. This has only a very small impact on insert speed, but it does double the space requirements for the index." source Luckily your query rewrite has fixed this problem (which caused the delay of our project since that was not considered acceptable). https://jira.mongodb.org/browse/SERVER-9547 | ||||||||||||||||||||||||||||||||||||||||
| Comment by David Storch [ 15/Jul/16 ] | ||||||||||||||||||||||||||||||||||||||||
|
Hi therefore, Thanks for the report. The MongoDB query execution engine was entirely rewritten for version 2.6, and it looks like you have stumbled upon a behavior change between the old implementation of the query engine and the current one. The current implementation of the maxScan option in the IXSCAN stage stops returning query results as soon as the index scan has examined maxScan keys: https://github.com/mongodb/mongo/blob/master/src/mongo/db/exec/index_scan.cpp#L166-L169 Specifically, the following happens when you run your repro script. The index scan looks at the first key, increments keysExamined, and then returns the key. When it looks at the second key, it increments keysExamined, but before returning the key, it notices that keysExamined is equal to maxScan. At this point, it ends the scan and stops returning further query results, without ever returning the second key. Arguably, this is incorrect, and we should fix it by modifying the code to look like this:
In other words, we should only increment keysExamined after making the maxScan check. That said, maxScan makes no guarantees about how many query results should be returned to the application. If you wish to restrict the result set to a particular size, the correct option to use is cursor.limit(). The intended use case for maxScan is to protect your database against runaway slow queries which perform large index scans (if, for example, your query patterns are unpredictable and you can't ensure that every query will be well indexed). I am going to move this ticket into a Needs Triage state so that it can be evaluated by the development team. In the meantime, I highly recommend that, if possible, you migrate your application to use limit rather than maxScan. Best, | ||||||||||||||||||||||||||||||||||||||||
| Comment by George Thompson [ 15/Jul/16 ] | ||||||||||||||||||||||||||||||||||||||||
|
Thanks. We came across this problem while rewriting our C++ code to the new standard and using "$maxScan" << 2 with bsoncxx::builder::stream::document find_modifiers. We use this extensively in our application. | ||||||||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 15/Jul/16 ] | ||||||||||||||||||||||||||||||||||||||||
|
Thanks for your report therefore, I'm able to reproduce the behavior you describe from 2.6 to 3.2 as well, so we're investigating. |