[SERVER-40167] Index key removal should not encounter prepare conflicts on unrelated keys Created: 15/Mar/19  Updated: 06/Dec/22  Resolved: 12/Aug/21

Status: Closed
Project: Core Server
Component/s: Querying, Replication
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Louis Williams Assignee: Backlog - Storage Execution Team
Resolution: Won't Fix Votes: 0
Labels: newgrad, txn_storage
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-40176 Cursor seekExact should not use WT_CU... Closed
related to SERVER-41977 Index lookups may conflict with prepa... Closed
related to SERVER-59842 Setting wildcard index as multikey ca... Closed
related to SERVER-59338 Complete TODO listed in SERVER-40167 Closed
Assigned Teams:
Storage Execution
Operating System: ALL
Steps To Reproduce:

(function() {
    "use strict";
    load("jstests/core/txns/libs/prepare_helpers.js");
 
    const rst = new ReplSetTest({nodes: 1});
    rst.startSet();
    rst.initiate();
 
    const dbName = "test";
    const collName = "delete_prepare_conflict";
 
    const primary = rst.getPrimary();
    const testDB = primary.getDB(dbName);
    const testColl = testDB.getCollection(collName);
 
    assert.commandWorked(testDB.runCommand({create: collName, writeConcern: {w: "majority"}}));
 
    // Create an index on "num" so that updates and deletes will not encounter prepare conflicts.
    assert.commandWorked(
        testDB.runCommand({createIndexes: collName, indexes: [{key: {"num": 1}, name: "num_1"}]}));
 
    const session = primary.startSession({causalConsistency: false});
    const sessionDB = session.getDatabase(dbName);
    const sessionColl = sessionDB.getCollection(collName);
 
    session.startTransaction();
    assert.commandWorked(sessionColl.insert({_id: 42}));  // num: null
 
    PrepareHelpers.prepareTransaction(session);
 
    // Uncommenting this allows the remove() to succeed.    // By inserting another document, search_near will positions the cursor on this index key, rather than the null key for the document {_id: 42}.
    /*   assert.commandWorked(testColl.insert({num: 1}));   */
 
    assert.commandWorked(testColl.insert({num: 100}));
 
    // Blocks due to a prepare conflict after the document has been deleted because the index cursor is repositioned on the document {_id: 42}
    assert.commandWorked(testColl.remove({num: 100}));
 
    session.abortTransaction_forTesting();
    rst.stopSet();
})();

Sprint: Storage NYC 2019-05-06, Execution Team 2021-02-08, Execution Team 2021-02-22
Participants:
Linked BF Score: 8

 Description   

After an index key is deleted, the cursor on the index is re-established by calling restoreState(). This restore uses a search_near to reposition the cursor at the original position. Since this key has now been deleted, the search_near positions the cursor at the logically adjacent value. If this adjacent key is involved in a prepared transaction, the cursor restoration encounters a prepare conflict, when it would not have otherwise until this point.

Instead of calling restoreState() after deleting each key, which effectively repositions the cursor on a deleted entry every single time, it may make sense to instead directly position the cursor on the next key for deletion.

Note: This is only true once SERVER-39074 is completed, and all write operations start enforcing prepare conflicts.

Example stacktrace:

#0  0x00007f1f580f83f7 in poll () from /lib64/libc.so.6
#1  0x00007f1f6598529c in mongo::transport::TransportLayerASIO::BatonASIO::run (this=0x7f1ee00c35f0, clkSource=0x561bc5e03b20)
    at src/mongo/transport/baton_asio_linux.h:315
#2  0x00007f1f5a44c2b2 in mongo::Waitable::wait(mongo::Waitable*, mongo::ClockSource*, mongo::stdx::condition_variable&, std::unique_lock<std::mutex>&)::{lambda()#1}::operator()() const (__closure=0x7f1f3c4cd1b0) at src/mongo/util/waitable.h:59
#3  0x00007f1f5a44d083 in mongo::stdx::condition_variable::_runWithNotifyable<mongo::Waitable::wait(mongo::Waitable*, mongo::ClockSource*, mongo::stdx::condition_variable&, std::unique_lock<std::mutex>&)::{lambda()#1}>(mongo::Notifyable&, mongo::Waitable::wait(mongo::Waitable*, mongo::ClockSource*, mongo::stdx::condi
tion_variable&, std::unique_lock<std::mutex>&)::{lambda()#1}&&) (this=0x561bc5f84330, notifyable=..., cb=...) at src/mongo/stdx/condition_variable.h:164
#4  0x00007f1f5a44c326 in mongo::Waitable::wait (waitable=0x7f1ee00c35f0, clkSource=0x561bc5e03b20, cv=..., lk=...) at src/mongo/util/waitable.h:57
#5  0x00007f1f5a44a8d9 in mongo::OperationContext::<lambda()>::operator()(void) const (__closure=0x7f1f3c4cd270) at src/mongo/db/operation_context.cpp:287
#6  0x00007f1f5a44aae0 in mongo::OperationContext::waitForConditionOrInterruptNoAssertUntil (this=0x7f1ee00e3fe0, cv=..., m=..., deadline=...)
    at src/mongo/db/operation_context.cpp:292
#7  0x00007f1f659797f6 in mongo::Interruptible::waitForConditionOrInterruptNoAssert (this=0x7f1ee00e3fe0, cv=..., m=...) at src/mongo/util/interruptible.h:229
#8  0x00007f1f65979728 in mongo::Interruptible::waitForConditionOrInterrupt (this=0x7f1ee00e3fe0, cv=..., m=...) at src/mongo/util/interruptible.h:206
#9  0x00007f1f63ab541f in mongo::Interruptible::waitForConditionOrInterrupt<mongo::WiredTigerSessionCache::waitUntilPreparedUnitOfWorkCommitsOrAborts(mongo::OperationContext*, uint64_t)::<lambda()> >(mongo::stdx::condition_variable &, std::unique_lock<std::mutex> &, mongo::WiredTigerSessionCache::<lambda()>) (
    this=0x7f1ee00e3fe0, cv=..., m=..., pred=...) at src/mongo/util/interruptible.h:219
#10 0x00007f1f63ab44ac in mongo::WiredTigerSessionCache::waitUntilPreparedUnitOfWorkCommitsOrAborts (this=0x561bc5f841f0, opCtx=0x7f1ee00e3fe0, lastCount=0)
    at src/mongo/db/storage/wiredtiger/wiredtiger_session_cache.cpp:310
#11 0x00007f1f63a4d23e in mongo::wiredTigerPrepareConflictRetry<mongo::(anonymous namespace)::WiredTigerIndexCursorBase::seekWTCursor(const mongo::KeyString&)::<lambda()> >(mongo::OperationContext *, mongo::(anonymous namespace)::WiredTigerIndexCursorBase::<lambda()> &&) (opCtx=0x7f1ee00e3fe0, f=...)
    at src/mongo/db/storage/wiredtiger/wiredtiger_prepare_conflict.h:137
#12 0x00007f1f63a47420 in mongo::(anonymous namespace)::WiredTigerIndexCursorBase::seekWTCursor (this=0x7f1ee00c7500, query=...)
    at src/mongo/db/storage/wiredtiger/wiredtiger_index.cpp:1028
#13 0x00007f1f63a46966 in mongo::(anonymous namespace)::WiredTigerIndexCursorBase::restore (this=0x7f1ee00c7500)
    at src/mongo/db/storage/wiredtiger/wiredtiger_index.cpp:902
#14 0x00007f1f5f6bfec9 in mongo::IndexScan::doRestoreStateRequiresIndex (this=0x561bc5f801a0) at src/mongo/db/exec/index_scan.cpp:246
#15 0x00007f1f5f6f7163 in mongo::RequiresIndexStage::doRestoreStateRequiresCollection (this=0x561bc5f801a0) at src/mongo/db/exec/requires_index_stage.cpp:73
#16 0x00007f1f5f6f63a3 in mongo::RequiresCollectionStageBase<mongo::Collection const*>::doRestoreState (this=0x561bc5f801a0)
    at src/mongo/db/exec/requires_collection_stage.cpp:70
#17 0x00007f1f5f6d689f in mongo::PlanStage::restoreState (this=0x561bc5f801a0) at src/mongo/db/exec/plan_stage.cpp:75
#18 0x00007f1f5f6d687a in mongo::PlanStage::restoreState (this=0x7f1ee0058640) at src/mongo/db/exec/plan_stage.cpp:72
#19 0x00007f1f5f69f0b9 in mongo::DeleteStage::doWork (this=0x7f1ee00c3c70, out=0x7f1f3c4ce210) at src/mongo/db/exec/delete.cpp:235
#20 0x00007f1f5f6d6674 in mongo::PlanStage::work (this=0x7f1ee00c3c70, out=0x7f1f3c4ce210) at src/mongo/db/exec/plan_stage.cpp:47
#21 0x00007f1f5f772295 in mongo::PlanExecutorImpl::_getNextImpl (this=0x7f1ee0004230, objOut=0x7f1f3c4ce380, dlOut=0x0)
    at src/mongo/db/query/plan_executor_impl.cpp:529
#22 0x00007f1f5f77149d in mongo::PlanExecutorImpl::getNext (this=0x7f1ee0004230, objOut=0x7f1f3c4ce3e0, dlOut=0x0)
    at src/mongo/db/query/plan_executor_impl.cpp:382
#23 0x00007f1f5f772c13 in mongo::PlanExecutorImpl::executePlan (this=0x7f1ee0004230) at src/mongo/db/query/plan_executor_impl.cpp:644
#24 0x00007f1f61dcdb08 in mongo::performSingleDeleteOp (opCtx=0x7f1ee00e3fe0, ns=..., stmtId=-1, op=...) at src/mongo/db/ops/write_ops_exec.cpp:889
#25 0x00007f1f61dce79f in mongo::performDeletes (opCtx=0x7f1ee00e3fe0, wholeOp=...) at src/mongo/db/ops/write_ops_exec.cpp:967
#26 0x00007f1f56ee85e5 in mongo::(anonymous namespace)::CmdDelete::Invocation::runImpl (this=0x7f1ee00da0b0, opCtx=0x7f1ee00e3fe0, result=...)
    at src/mongo/db/commands/write_commands/write_commands.cpp:428



 Comments   
Comment by Githook User [ 08/Sep/21 ]

Author:

{'name': 'Louis Williams', 'email': 'louis.williams@mongodb.com', 'username': 'louiswilliams'}

Message: SERVER-59338 Remove TODO in code for SERVER-40167
Branch: master
https://github.com/mongodb/mongo/commit/7827431a24b7e37c5e282743511b6bfb6310d401

Comment by Louis Williams [ 12/Aug/21 ]

We have decided that we will not fix this problem. Prepare conflicts are meant to resolve as quickly as possible, so any visible stalls caused by this behavior should not last for long periods of time.

Comment by Louis Williams [ 27/Jun/19 ]

I recently filed SERVER-41977, which describes a similar issue but can occur on all updates and key removals, even if the key exists.

Comment by Louis Williams [ 07/May/19 ]

Flagging for scheduling. The work required would be to not restore cursors in the delete code path. Additionally, this issue is hard to observe in the wild, and likely would be hard to distinguish from expected prepare conflicts.

Comment by Judah Schvimer [ 01/May/19 ]

Removing from the prepare epic so this can be prioritized by the storage team separately from the rest of the project.

Comment by Louis Williams [ 28/Mar/19 ]

The storage issue is that Cursor::restore() on a missing record may accidentally land on an adjacent record which is part of a prepared transaction. This would ideally be fixed by SERVER-40176.

The query involvement is that the DeleteStage calls saveState(), deletes a document, then restoreState(). This closes and repositions the cursor on a deleted record, which may introduce this undesirable behavior.

The goal of this ticket would be to not restore Index/CollectionScan cursors after deleting documents because they always reposition on deleted records. Even if SERVER-40176 is completed, I think this will still be an issue, because cursor restoration depends on an approximate key searching. That being said, I'm not sure how this fix would be implemented given the current code.

I'll assign this back to storage so we can investigate how this would actually work.

Comment by Craig Homa [ 28/Mar/19 ]

Hey louis.williams, the Query team is wondering why this is in their bucket. Should this sit with sharding instead?

Comment by Louis Williams [ 18/Mar/19 ]

I think the reason is simply that each passthrough test operates on its own session and its own collection. We’ll never see this issue there because it only happens when different sessions are operating on the same collection. Even for FSM tests, prepared transactions always end up getting committed or aborted. So even if an operation encounters a prepare conflict, the conflicting prepared transactions will eventually finish, and the conflict will resolve. This issue seems isolated only to targeted replica set tests where prepared transactions are held open and another operation is attempted.

Comment by Judah Schvimer [ 18/Mar/19 ]

louis.williams, why do you not expect this to happen in jscore passthroughs that wrap crud ops in transactions or concurrency suites that use transactions?

Comment by Louis Williams [ 18/Mar/19 ]

judah.schvimer I wouldn't say this is 'blocking' passthrough testing. If this failure were to occur outside the suites it's been observed, it may be tricky to diagnose. Outside of testing, my concern is for the difficulty of diagnosis if a user happens to encounter this. It would be very unexpected, and in general, is not correct behavior for MongoDB.

Comment by Judah Schvimer [ 16/Mar/19 ]

louis.williams, am I correct that this is effectively blocking a lot of our passthrough testing for sharded transactions because we would get hard to diagnose BFs as a result? If so, I will mark this as a "Blocker" to escalate its priority.

Comment by Louis Williams [ 15/Mar/19 ]

I filed SERVER-40176 for seekExact to perform a cursor->search rather than a cursor->search_near.

Generated at Thu Feb 08 04:54:12 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.