[SERVER-14703] Snapshot queries can miss records if there are concurrent updates Created: 27/Jul/14 Updated: 31/Oct/16 Resolved: 14/Dec/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Querying |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Bruce Lucas (Inactive) | Assignee: | David Storch |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||
| Sprint: | QuInt E (01/11/16) | ||||||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||||||
| Description |
|
"Snapshot" queries (queries using a unique index) can miss records if there are concurrent updates that move the records being queried (even if they don't modify the query keys). It would be useful if this could be improved so that snapshot queries were guaranteed to return all records that existed for the lifetime of the query (neither inserted nor deleted during the query). Reproduce as follows:
|
| Comments |
| Comment by David Storch [ 14/Dec/15 ] | |||||||||||||
|
Per jason.rassi's earlier comment, we believe that this issue was fixed in 3.1.2 as part of the SortedDataInterface::Cursor refactor. I am therefore closing this ticket as a duplicate of Also note that | |||||||||||||
| Comment by J Rassi [ 31/Jul/15 ] | |||||||||||||
|
Per discussion with Dave and Bruce, we believe that it is no longer possible to reproduce a case where a simple _id index scan misses a document due to a move (Bruce/others: do add a comment here if you can manage to reproduce this on 3.1.2+). We could add the following dbtest for the IndexScan stage as a regression test for this issue:
I suggest we consider the work for this ticket to be adding the above dbtest, and resolve this issue once it's committed. | |||||||||||||
| Comment by J Rassi [ 29/Jul/15 ] | |||||||||||||
|
The test from the ticket description seems to fail on 2.4.14 and earlier, but pass on 2.6.0 and later (Bruce, is this consistent with your recollection?). The below script is similar to the one in the ticket description, but the update moves the document to a lower diskloc instead of a higher diskloc. This script fails on versions earlier than 3.1.1 (I tried as far back as 2.4), but passes on 3.1.2 through 3.1.6.
I confirmed with git bisect that db59e0f3 contains the fix that causes this test to pass. Interestingly, Mathias tells me that this fix was not intentional, and I can reaffirm Scott's claim that the system does not make any guarantee about whether concurrently updated documents are returned by these queries. Bruce: given all this, do you think we should resolve this issue? | |||||||||||||
| Comment by Scott Hernandez (Inactive) [ 27/Jul/14 ] | |||||||||||||
|
The system is designed to work this way and snapshot queries only ensure you don't get dups. It does not ensure that you get a snapshot of "time" as all cursors are live and affected by writes. |