[SERVER-40705] Stepdown with prepared transaction that performed text search triggers invariant Created: 17/Apr/19  Updated: 29/Oct/23  Resolved: 09/May/19

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 4.1.12

Type: Bug Priority: Major - P3
Reporter: Jack Mulrow Assignee: Lingzhi Deng
Resolution: Fixed Votes: 0
Labels: prepare_durability, todo_in_code
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-40723 Deadlock between S lock acquisition o... Closed
related to SERVER-42536 Complete TODO listed in SERVER-40705 Closed
related to SERVER-43481 Complete TODO listed in SERVER-40705 Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

Run this jstest:

(function() {
    "use strict";
    load("jstests/core/txns/libs/prepare_helpers.js");  // for prepareTransaction
 
    const rst = new ReplSetTest({nodes: 1});
    rst.startSet();
    rst.initiate();
 
    const session = rst.getPrimary().startSession();
    const sessionDB = session.getDatabase("test");
    const sessionColl = sessionDB.getCollection("foo");
 
    // Set up a collection with a text index and one document with the text field.
    assert.commandWorked(sessionDB.runCommand({create: "foo", writeConcern: {w: "majority"}}));
    assert.commandWorked(sessionColl.createIndex({text: "text"}));
    assert.commandWorked(sessionColl.insert({text: "dummy text for repro"}));
 
    // Perform a text search in a transaction and prepare it.
    session.startTransaction();
    assert.eq(sessionColl.find({$text: {$search: "repro"}}).itcount(), 1);
    assert.eq(sessionColl.find().itcount(), 1);
    const prepareTimestamp = PrepareHelpers.prepareTransaction(session);
 
    // Stepping down the primary triggers an invariant and the test fails here.
    assert.adminCommandWorkedAllowingNetworkError(rst.getPrimary(),
                                                  {replSetStepDown: 1, force: true});
 
    PrepareHelpers.commitTransaction(session, prepareTimestamp);
 
    rst.stopSet();
})();

Sprint: Repl 2019-05-06, Repl 2019-05-20
Participants:

 Description   

When a primary steps down, the locks for all currently prepared transactions are yielded. As part of this, LockerImpl::releaseUnitOfWork() is called for each prepared transaction, which invariants the transaction's locker is not global locked recursively. This invariant fails if the prepared transaction read using a text index, like in the attached repro.

Example backtrace:

[js_test:text_prepared_transaction_repro] 2019-04-17T17:57:23.560-0400 d20020| 2019-04-17T17:57:23.560-0400 F  -        [conn1] Invariant failure !isGlobalLockedRecursively() src/mongo/db/concurrency/lock_state.cpp 440
[js_test:text_prepared_transaction_repro] 2019-04-17T17:57:23.560-0400 d20020| 2019-04-17T17:57:23.560-0400 F  -        [conn1]
[js_test:text_prepared_transaction_repro] 2019-04-17T17:57:23.560-0400 d20020|
[js_test:text_prepared_transaction_repro] 2019-04-17T17:57:23.560-0400 d20020| ***aborting after invariant() failure
[js_test:text_prepared_transaction_repro] 2019-04-17T17:57:23.560-0400 d20020|
[js_test:text_prepared_transaction_repro] 2019-04-17T17:57:23.560-0400 d20020|
[js_test:text_prepared_transaction_repro] 2019-04-17T17:57:23.717-0400 d20020| 2019-04-17T17:57:23.717-0400 F  -        [conn1] Got signal: 6 (Aborted).
[js_test:text_prepared_transaction_repro] 2019-04-17T17:57:23.718-0400 d20020| ----- BEGIN BACKTRACE -----
[js_test:text_prepared_transaction_repro] 2019-04-17T17:57:23.718-0400 d20020|  mongod(mongo::printStackTrace(std::ostream&) 0x36) [0x55d0468bb0ab]
[js_test:text_prepared_transaction_repro] 2019-04-17T17:57:23.718-0400 d20020|  mongod( 0x5A8B18A) [0x55d0468ba18a]
[js_test:text_prepared_transaction_repro] 2019-04-17T17:57:23.718-0400 d20020|  mongod( 0x5A8B5D2) [0x55d0468ba5d2]
[js_test:text_prepared_transaction_repro] 2019-04-17T17:57:23.718-0400 d20020|  libpthread.so.0( 0x12890) [0x7fa8f363c890]
[js_test:text_prepared_transaction_repro] 2019-04-17T17:57:23.718-0400 d20020|  libc.so.6(gsignal 0xC7) [0x7fa8f3277e97]
[js_test:text_prepared_transaction_repro] 2019-04-17T17:57:23.718-0400 d20020|  libc.so.6(abort 0x141) [0x7fa8f3279801]
[js_test:text_prepared_transaction_repro] 2019-04-17T17:57:23.718-0400 d20020|  mongod(mongo::invariantFailedWithMsg(char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, char const*, unsigned int) 0x0) [0x55d0468acfc6]
[js_test:text_prepared_transaction_repro] 2019-04-17T17:57:23.718-0400 d20020|  mongod(void mongo::invariantWithLocation<bool>(bool const&, char const*, char const*, unsigned int) 0x3F) [0x55d043f63c95]
[js_test:text_prepared_transaction_repro] 2019-04-17T17:57:23.718-0400 d20020|  mongod(mongo::LockerImpl::releaseWriteUnitOfWork(mongo::Locker::LockSnapshot*) 0xA6) [0x55d0466fe51c]
[js_test:text_prepared_transaction_repro] 2019-04-17T17:57:23.718-0400 d20020|  mongod(mongo::TransactionParticipant::TxnResources::TxnResources(mongo::WithLock, mongo::OperationContext*, mongo::TransactionParticipant::TxnResources::StashStyle) 0x332) [0x55d044ac3ffa]
[js_test:text_prepared_transaction_repro] 2019-04-17T17:57:23.718-0400 d20020|  mongod(mongo::TransactionParticipant::Participant::refreshLocksForPreparedTransaction(mongo::OperationContext*, bool) 0x20C) [0x55d044ac59ec]
[js_test:text_prepared_transaction_repro] 2019-04-17T17:57:23.718-0400 d20020|  mongod( 0x3BA6D92) [0x55d0449d5d92]
[js_test:text_prepared_transaction_repro] 2019-04-17T17:57:23.719-0400 d20020|  mongod( 0x3BA8705) [0x55d0449d7705]
[js_test:text_prepared_transaction_repro] 2019-04-17T17:57:23.719-0400 d20020|  mongod(std::function<void (mongo::OperationContext*, mongo::SessionCatalog::SessionToKill const&)>::operator()(mongo::OperationContext*, mongo::SessionCatalog::SessionToKill const&) const 0x61) [0x55d0449d9043]
[js_test:text_prepared_transaction_repro] 2019-04-17T17:57:23.719-0400 d20020|  mongod( 0x3BA5DBF) [0x55d0449d4dbf]
[js_test:text_prepared_transaction_repro] 2019-04-17T17:57:23.719-0400 d20020|  mongod(mongo::yieldLocksForPreparedTransactions(mongo::OperationContext*) 0x259) [0x55d0449d602a]
[js_test:text_prepared_transaction_repro] 2019-04-17T17:57:23.719-0400 d20020|  mongod(mongo::repl::ReplicationCoordinatorImpl::stepDown(mongo::OperationContext*, bool, mongo::Duration<std::ratio<1l, 1000l> > const&, mongo::Duration<std::ratio<1l, 1000l> > const&) 0x67E) [0x55d04409f8b0]
[js_test:text_prepared_transaction_repro] 2019-04-17T17:57:23.719-0400 d20020|  mongod(mongo::repl::CmdReplSetStepDown::run(mongo::OperationContext*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::BSONObj const&, mongo::BSONObjBuilder&) 0x425) [0x55d04405e6c5]
[js_test:text_prepared_transaction_repro] 2019-04-17T17:57:23.719-0400 d20020|  mongod(mongo::BasicCommand::Invocation::run(mongo::OperationContext*, mongo::rpc::ReplyBuilderInterface*) 0xA8) [0x55d046055726]
[js_test:text_prepared_transaction_repro] 2019-04-17T17:57:23.719-0400 d20020|  mongod( 0x3760955) [0x55d04458f955]
[js_test:text_prepared_transaction_repro] 2019-04-17T17:57:23.719-0400 d20020|  mongod( 0x3763044) [0x55d044592044]
[js_test:text_prepared_transaction_repro] 2019-04-17T17:57:23.719-0400 d20020|  mongod( 0x3764104) [0x55d044593104]
[js_test:text_prepared_transaction_repro] 2019-04-17T17:57:23.719-0400 d20020|  mongod( 0x3764A21) [0x55d044593a21]
[js_test:text_prepared_transaction_repro] 2019-04-17T17:57:23.719-0400 d20020|  mongod(mongo::ServiceEntryPointCommon::handleRequest(mongo::OperationContext*, mongo::Message const&, mongo::ServiceEntryPointCommon::Hooks const&) 0x3C4) [0x55d044595caa]
[js_test:text_prepared_transaction_repro] 2019-04-17T17:57:23.719-0400 d20020|  mongod(mongo::ServiceEntryPointMongod::handleRequest(mongo::OperationContext*, mongo::Message const&) 0x54) [0x55d04457e366]
[js_test:text_prepared_transaction_repro] 2019-04-17T17:57:23.719-0400 d20020|  mongod(mongo::ServiceStateMachine::_processMessage(mongo::ServiceStateMachine::ThreadGuard) 0x24B) [0x55d04458a53f]
[js_test:text_prepared_transaction_repro] 2019-04-17T17:57:23.719-0400 d20020|  mongod(mongo::ServiceStateMachine::_runNextInGuard(mongo::ServiceStateMachine::ThreadGuard) 0x192) [0x55d04458632a]
[js_test:text_prepared_transaction_repro] 2019-04-17T17:57:23.719-0400 d20020|  mongod( 0x3757704) [0x55d044586704]
[js_test:text_prepared_transaction_repro] 2019-04-17T17:57:23.720-0400 d20020|  mongod( 0x3758C22) [0x55d044587c22]
[js_test:text_prepared_transaction_repro] 2019-04-17T17:57:23.720-0400 d20020|  mongod(std::function<void ()>::operator()() const 0x32) [0x55d043fa2df4]
[js_test:text_prepared_transaction_repro] 2019-04-17T17:57:23.720-0400 d20020|  mongod(mongo::transport::ServiceExecutorSynchronous::schedule(std::function<void ()>, mongo::transport::ServiceExecutor::ScheduleFlags, mongo::transport::ServiceExecutorTaskName) 0x145) [0x55d045be78a9]
[js_test:text_prepared_transaction_repro] 2019-04-17T17:57:23.720-0400 d20020|  mongod(mongo::ServiceStateMachine::_scheduleNextWithGuard(mongo::ServiceStateMachine::ThreadGuard, mongo::transport::ServiceExecutor::ScheduleFlags, mongo::transport::ServiceExecutorTaskName, mongo::ServiceStateMachine::Ownership) 0xDF) [0x55d04458689d]
[js_test:text_prepared_transaction_repro] 2019-04-17T17:57:23.720-0400 d20020|  mongod(mongo::ServiceStateMachine::_sourceCallback(mongo::Status) 0x133) [0x55d04458592b]
[js_test:text_prepared_transaction_repro] 2019-04-17T17:57:23.720-0400 d20020|  mongod( 0x3756351) [0x55d044585351]
[js_test:text_prepared_transaction_repro] 2019-04-17T17:57:23.720-0400 d20020|  mongod( 0x3757F61) [0x55d044586f61]
[js_test:text_prepared_transaction_repro] 2019-04-17T17:57:23.720-0400 d20020|  mongod( 0x3757FF3) [0x55d044586ff3]
[js_test:text_prepared_transaction_repro] 2019-04-17T17:57:23.720-0400 d20020|  mongod( 0x37583C0) [0x55d0445873c0]
[js_test:text_prepared_transaction_repro] 2019-04-17T17:57:23.720-0400 d20020|  mongod( 0x3758557) [0x55d044587557]
[js_test:text_prepared_transaction_repro] 2019-04-17T17:57:23.720-0400 d20020|  mongod( 0x3757E19) [0x55d044586e19]
[js_test:text_prepared_transaction_repro] 2019-04-17T17:57:23.720-0400 d20020|  mongod(mongo::ServiceStateMachine::_sourceMessage(mongo::ServiceStateMachine::ThreadGuard) 0xE2) [0x55d044585478]
[js_test:text_prepared_transaction_repro] 2019-04-17T17:57:23.720-0400 d20020|  mongod(mongo::ServiceStateMachine::_runNextInGuard(mongo::ServiceStateMachine::ThreadGuard) 0x141) [0x55d0445862d9]
[js_test:text_prepared_transaction_repro] 2019-04-17T17:57:23.720-0400 d20020|  mongod( 0x3757704) [0x55d044586704]
[js_test:text_prepared_transaction_repro] 2019-04-17T17:57:23.720-0400 d20020|  mongod( 0x3758C22) [0x55d044587c22]
[js_test:text_prepared_transaction_repro] 2019-04-17T17:57:23.720-0400 d20020|  mongod(std::function<void ()>::operator()() const 0x32) [0x55d043fa2df4]
[js_test:text_prepared_transaction_repro] 2019-04-17T17:57:23.721-0400 d20020|  mongod( 0x4DB86E1) [0x55d045be76e1]
[js_test:text_prepared_transaction_repro] 2019-04-17T17:57:23.721-0400 d20020|  mongod( 0x4DB8D7E) [0x55d045be7d7e]
[js_test:text_prepared_transaction_repro] 2019-04-17T17:57:23.721-0400 d20020|  mongod(std::function<void ()>::operator()() const 0x32) [0x55d043fa2df4]
[js_test:text_prepared_transaction_repro] 2019-04-17T17:57:23.721-0400 d20020|  mongod( 0x5754B9E) [0x55d046583b9e]
[js_test:text_prepared_transaction_repro] 2019-04-17T17:57:23.721-0400 d20020|  libpthread.so.0( 0x76DB) [0x7fa8f36316db]
[js_test:text_prepared_transaction_repro] 2019-04-17T17:57:23.721-0400 d20020|  libc.so.6(clone 0x3F) [0x7fa8f335a88f]
[js_test:text_prepared_transaction_repro] 2019-04-17T17:57:23.721-0400 d20020| -----  END BACKTRACE  -----



 Comments   
Comment by Githook User [ 08/May/19 ]

Author:

{'name': 'Lingzhi Deng', 'username': 'ldennis', 'email': 'lingzhi.deng@mongodb.com'}

Message: SERVER-40705: No need to delay unlocking if lock is recursively acquired
Branch: master
https://github.com/mongodb/mongo/commit/44f49a9a97b382e07beca0d3fabf5991eb3a41fa

Comment by Lingzhi Deng [ 23/Apr/19 ]

We might consider not yielding locks anymore on secondaries for SERVER-40723. If we decide not to yield locks, this will not be a problem anymore. 

Comment by Lingzhi Deng [ 23/Apr/19 ]

The global lock are taken recursively here while the transaction itself already has the global lock. It feels to me that this could happen for any commands that run in a transaction but use AutoGetDb to get their own database locks. Aggregation is another example. Running an aggregate in a transaction that requires resolving namespaces would trigger the same invariant.

Generated at Thu Feb 08 04:55:45 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.