[SERVER-71674] Yield handlers not handling WCE thrown from restoring cursors Created: 29/Nov/22 Updated: 29/Nov/23 |
|
| Status: | Blocked |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Yuhong Zhang | Assignee: | Backlog - Storage Execution Team |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Assigned Teams: |
Storage Execution
|
||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||
| Sprint: | Execution Team 2023-02-20, Execution Team 2023-02-06, Execution Team 2023-03-06 | ||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Linked BF Score: | 6 | ||||||||||||||||||||||||
| Description |
|
In some query stages like delete, we will need to call restoreState() in case the actual delete throws a WCE which invalidates the cursor. But restoreState() can also throw a WCE which is not handled. This can cause the following next() call lands the cursor to the beginning of the collection again, hitting the out-of-order invariant. |
| Comments |
| Comment by Dianna Hohensee (Inactive) [ 27/Feb/23 ] |
|
I'm going to put this ticket back into Needs Triage to gain wider support for either (a) removing the failpoint testing for artificial read path WCEs A pro of keeping the testing active would be to catch new use cases. Although it's unclear to me how useful the testing currently is in that regard, since we clearly have scenarios and they are rarely caught. Nor does the failure signal help us with the final solution (complete read path WCE handling). A con of keeping the testing active is it could lead to more crazy long investigations when the source of the failure is very non-obvious. |
| Comment by Yuhong Zhang [ 21/Feb/23 ] |
|
I pointed Dianna to the wrong failpoint. The one that throws WCEs is actually here. I'm unlinking |
| Comment by Dianna Hohensee (Inactive) [ 30/Jan/23 ] |
|
Linking |
| Comment by Dianna Hohensee (Inactive) [ 20/Jan/23 ] |
|
I spoke offline with Yuhong. The idea here is that any query stage ::restoreState() function call, like this one where the WCE arose in the linked test failure , can potentially throw a WCE. This happens because after an initial WCE A causes the query stage to request a yield, the query restore state logic ultimately calls WT::search_near() to reposition the cursor that was unpositioned by WCE A and can throw a new WCE B. This is the testing-only failpoint to randomly throw artificial WCEs. Because any WT read in theory could throw a WCE. However, I don't believe WT ever does throw a WCE for a read today. It only happens artificially in the MDB layer for testing purposes. And I don't believe query code today is prepared to deal with a WCE. And that is what is happening in our test suite artificially, and the motivation for this ticket. Therefore the options for this ticket are, (a) we start supporting WCE for reads in the query layer. On first thought for (a), we retry the WCE someplace in the query operation stack, another WCE retry loop around the query ::restoreState() calls, or inside said function call. In theory an operation can throw WCEs endlessly, but that's true today for all WCE handling. (c) we modify our testing not to throw artificial WCEs in the query read path / for search_near() calls. |
| Comment by Fausto Leyva (Inactive) [ 29/Nov/22 ] |
|
When this ticket is addressed, we should request QE for a code review. |