[SERVER-81493] Handle StorageUnavailableException when resetting WiredTiger cursors Created: 27/Sep/23 Updated: 29/Oct/23 Resolved: 12/Oct/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 7.2.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Josef Ahmad | Assignee: | Louis Williams |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||
| Assigned Teams: |
Storage Execution NAMER
|
||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||
| Backport Requested: |
v7.0
|
||||||||||||||||||||
| Sprint: | Execution NAMR Team 2023-10-16 | ||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Linked BF Score: | 120 | ||||||||||||||||||||
| Description |
|
A call to WT_CURSOR::reset can roll back due to cache pressure. There's a long-standing assumption (since at least 3.6) that ignoring these rollbacks is safe because the transaction is getting killed anyway. This assumption is incorrect: query plans reset the cursor before performing the write (e.g. the update stage). When the exception is swallowed, the write proceeds and eventually fails to commit due to the transaction requiring rollback. I've linked a build failure where a replica set reconfig raced with a test designed to generate a transaction too large to fit in cache. WiredTiger rolled back the oldest transaction to ease the cache pressure, the oldest transaction happened to be the reconfig thread persisting the new configuration, and not handling that exception eventually failed an invariant when trying to commit the transaction. We should not swallow StorageUnavailableExceptions in WiredTigerRecordStoreCursorBase::save() and WiredTigerRecordStore::RandomCursor::save() and handle the exception accordingly up in the call chain. We should also investigate if callers of WiredTigerIndexCursorGeneric::resetCursor() and PlanYieldPolicy::yieldOrInterrupt() are similarly impacted. |
| Comments |
| Comment by Githook User [ 11/Oct/23 ] |
|
Author: {'name': 'Louis Williams', 'email': 'louis.williams@mongodb.com', 'username': 'louiswilliams'}Message: |