[SERVER-16675] getMore looks up doc with invalid RecordId, fails with "Didn't find RecordId in WiredTigerRecordStore" Created: 27/Dec/14 Updated: 21/Jan/15 Resolved: 07/Jan/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Index Maintenance, Storage |
| Affects Version/s: | 2.8.0-rc4 |
| Fix Version/s: | 2.8.0-rc5 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Kamran K. | Assignee: | David Storch |
| Resolution: | Done | Votes: | 0 |
| Labels: | 28qa, wiredtiger | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Steps To Reproduce: |
|
||||||||||||||||
| Participants: | |||||||||||||||||
| Linked BF Score: | 0 | ||||||||||||||||
| Description |
|
The included script reliably triggers a 'Didn't find RecordId in WiredTigerRecordStore' error. I wasn't able to trigger the error by using a non-text index and non-text queries in the script. This issue might be related to Version: b0014456 |
| Comments |
| Comment by Githook User [ 07/Jan/15 ] | ||||||||||||
|
Author: {u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}Message: This fixes an issue with WiredTiger query isolation. | ||||||||||||
| Comment by J Rassi [ 28/Dec/14 ] | ||||||||||||
|
This report exposes a design issue related to query isolation when using WiredTiger (and no, this isn't related to In mmapv1, a document update or delete generates a broadcast call to PlanExecutor::invalidate() on all registered PlanExecutor objects. invalidate() instructs the PlanExecutor to wipe any saved state it has associated with the given RecordId, as the record may no longer exist or match the original predicate. In WiredTiger, reads have snapshot isolation, so document updates/deletes do not affect other operations' active reads. Thus, for WiredTiger, the server never makes any calls to PlanExecutor::invalidate(). The supporting comment in CursorManager::invalidateDocument() reads: "If a storage engine supports doc locking, then we do not need to invalidate. The transactional boundaries of the operation protect us." The issue here is that queries drop their snapshot between calls to getMore; this violates the assumption that the operation is protected from other writes by its transactional boundaries. Specifically: query stages are allowed to save references to RecordIds that they encounter, and the query subsystem guarantees that each RecordId will continue to refer to the same exact document until it is invalidated. In this case, the document is deleted, and the query's new snapshot reflects that the document has been deleted, but the stage was never notified of the deletion. Before a query starts operating on a new snapshot, it needs to be delivered every invalidate() notification that's been generated since the creation of its previous snapshot. This issue is particularly easy to reproduce with the TEXT stage, which buffers the entire RecordId result set before returning any documents to the user. Though, many stages buffer RecordIds. See the following repro which uses SORT_MERGE (SORT_MERGE buffers the RecordId of the upcoming document from each child scan):
Marking 2.8.0-rc5. cc eliot, schwerin, redbeard0531, david.storch. Thanks for the report, kamran.khan. |