[SERVER-41282] Investigate invariant failure in WTIndex::updatePosition() Created: 22/May/19  Updated: 28/May/19  Resolved: 28/May/19

Status: Closed
Project: Core Server
Component/s: Sharding, Storage
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Janna Golden Assignee: Gregory Wlodarek
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File 0001-SERVER-39141-The-range-deleter-should-retry-on-write.patch    
Issue Links:
Related
is related to SERVER-41311 Invariant that restore() is called on... Closed
Operating System: ALL
Sprint: Execution Team 2019-06-03
Participants:

 Description   

I hit this invariant when testing a change to the range deleter. I'm hitting this only in the conucrrency_local_read_write_multi_stmt_txn suite when running either of random_movechunk_broadcast_delete_transaction.js/update_transaction.js. The following is the situation in which we hit this:

1. a transaction starts at the same time as the range deleter starts running. (the coordinator shard is also the shard that the range deleter is running on)
2. the range deleter continually retries on WCE (I added logging so can confirm)
3. the coordinator shard receives votes to commit, advances the cluster time to the commit timestamp, and writes the decision
4. coord sends commit to itself and the other participant
5. coord receives commit and then the range deleter hits this invariant `Invariant failure !getTestCommandsEnabled()` in wiredtired_index.cpp in `updatePosition()` and logs `WTIndex::updatePosition – the new key ( 2BCA041DB0) is less than the previous key (2C01B20406D0), which is a bug.`

I've attached a git format-patch to repro.



 Comments   
Comment by Gregory Wlodarek [ 28/May/19 ]

The issue was that the NO_YIELD policy was being used by the PlanExecutor which requires that the caller (CollectionRangeDeleter) calls saveState() and restoreState(). Wrapping the getNext() call inside the writeConflictRetry() macro caused us to abandon our snapshot, which then causes us to hit the invariant since the cursor was not restored. Changing the yield policy to WRITE_CONFLICT_RETRY_ONLY and removing the writeConflictRetry() macro resolves this issue.

Generated at Thu Feb 08 04:57:18 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.