[SERVER-40321] Rolling back a prepared transaction on a capped collection leads to an invariant failure Created: 25/Mar/19  Updated: 29/Oct/23  Resolved: 17/Apr/19

Status: Closed
Project: Core Server
Component/s: Concurrency, Replication
Affects Version/s: None
Fix Version/s: 4.1.11

Type: Bug Priority: Major - P3
Reporter: Max Hirschhorn Assignee: Dianna Hohensee (Inactive)
Resolution: Fixed Votes: 0
Labels: prepare_durability, rbfz, txn_storage
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Documented
is documented by DOCS-12641 Docs for SERVER-40321: Rolling back a... Closed
Gantt Dependency
Related
related to SERVER-42372 Reads against capped collections aren... Closed
related to SERVER-40684 Ban transactions against capped colle... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

python buildscripts/resmoke.py --suites=no_server repro_server40321.js

repro_server40321.js

(function() {
    "use strict";
 
    load("jstests/core/txns/libs/prepare_helpers.js");
    load("jstests/replsets/libs/rollback_test.js");
 
    const rollbackTest = new RollbackTest();
    const primary = rollbackTest.getPrimary();
 
    rollbackTest.transitionToRollbackOperations();
 
    const testDB = primary.getDB("test");
    assert.commandWorked(testDB.runCommand({create: "mycoll", capped: true, size: 4096}));
 
    const session = primary.startSession({causalConsistency: false});
    const sessionDB = session.getDatabase(testDB.getName());
 
    session.startTransaction();
    sessionDB.mycoll.insert({_id: 0});
 
    PrepareHelpers.prepareTransaction(session);
 
    rollbackTest.transitionToSyncSourceOperationsBeforeRollback();
    rollbackTest.transitionToSyncSourceOperationsDuringRollback();
    rollbackTest.transitionToSteadyStateOperations();
 
    rollbackTest.stop();
})();

Sprint: Storage NYC 2019-04-08, Storage NYC 2019-04-22
Participants:
Linked BF Score: 0

 Description   

Inserting a document into a capped collection acquires a RESOURCE_METADATA lock.

if (_needCappedLock) {
    // X-lock the metadata resource for this capped collection until the end of the WUOW. This
    // prevents the primary from executing with more concurrency than secondaries.
    // See SERVER-21646.
    Lock::ResourceLock heldUntilEndOfWUOW{
        opCtx->lockState(), ResourceId(RESOURCE_METADATA, _ns.ns()), MODE_X};
}

However, the invariant in LockerImpl::saveLockStateAndUnlock() claims that resource metadata locks never need to be saved.

// We should never have to save and restore metadata locks.
invariant(RESOURCE_DATABASE == resId.getType() || RESOURCE_COLLECTION == resId.getType() ||
          (RESOURCE_GLOBAL == resId.getType() && isSharedLockMode(it->mode)) ||
          (resourceIdReplicationStateTransitionLock == resId && it->mode == MODE_IX));


Thread 34 "conn2" received signal SIGTRAP, Trace/breakpoint trap.
[Switching to Thread 0x7f7917065700 (LWP 32518)]
0x00007f792b992727 in raise () from /lib/x86_64-linux-gnu/libpthread.so.0
(gdb) bt
#0  0x00007f792b992727 in raise () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x0000556d7486172b in mongo::breakpoint () at src/mongo/util/debugger.cpp:75
#2  0x0000556d72b0b177 in mongo::invariantFailed (expr=expr@entry=0x556d749e8a38 "RESOURCE_DATABASE == resId.getType() || RESOURCE_COLLECTION == resId.getType() || (RESOURCE_GLOBAL == resId.getType() && isSharedLockMode(it->mode)) || (resourceIdReplicationStateTransitionLock == res"..., file=file@entry=0x556d749e8670 "src/mongo/db/concurrency/lock_state.cpp", line=line@entry=716) at src/mongo/util/assert_util.cpp:102
#3  0x0000556d72adf972 in mongo::invariantWithLocation<bool> (testOK=<optimized out>, line=716, file=0x556d749e8670 "src/mongo/db/concurrency/lock_state.cpp", expr=0x556d749e8a38 "RESOURCE_DATABASE == resId.getType() || RESOURCE_COLLECTION == resId.getType() || (RESOURCE_GLOBAL == resId.getType() && isSharedLockMode(it->mode)) || (resourceIdReplicationStateTransitionLock == res"...) at src/mongo/util/invariant.h:64
#4  mongo::LockerImpl::saveLockStateAndUnlock (this=0x556d7c293b00, stateOut=0x556d7c28ea20) at src/mongo/db/concurrency/lock_state.cpp:714
#5  0x0000556d73c73695 in mongo::TransactionParticipant::TxnResources::TxnResources (this=0x7f7917062500, wl=..., opCtx=0x556d7cedf180, stashStyle=<optimized out>) at /opt/mongodbtoolchain/stow/gcc-v3.zr9/include/c++/8.2.0/bits/unique_ptr.h:342
#6  0x0000556d73c74d11 in mongo::TransactionParticipant::Participant::refreshLocksForPreparedTransaction (this=this@entry=0x7f79170625c8, opCtx=opCtx@entry=0x556d7cedf180, yieldLocks=yieldLocks@entry=true) at src/mongo/util/concurrency/with_lock.h:72
#7  0x0000556d7333dedf in mongo::<lambda(mongo::OperationContext*, const SessionToKill&)>::operator() (__closure=<optimized out>, session=..., killerOpCtx=0x556d7cedf180) at src/mongo/db/kill_sessions_local.cpp:195
#8  std::_Function_handler<void(mongo::OperationContext*, const mongo::SessionCatalog::SessionToKill&), mongo::yieldLocksForPreparedTransactions(mongo::OperationContext*)::<lambda(mongo::OperationContext*, const SessionToKill&)> >::_M_invoke(const std::_Any_data &, mongo::OperationContext *&&, const mongo::SessionCatalog::SessionToKill &) (__functor=..., __args#0=<optimized out>, __args#1=...) at /opt/mongodbtoolchain/stow/gcc-v3.zr9/include/c++/8.2.0/bits/std_function.h:297
#9  0x0000556d7333e3b1 in std::function<void (mongo::OperationContext*, mongo::SessionCatalog::SessionToKill const&)>::operator()(mongo::OperationContext*, mongo::SessionCatalog::SessionToKill const&) const (__args#1=..., __args#0=<optimized out>, this=0x7f7917062a00) at /opt/mongodbtoolchain/stow/gcc-v3.zr9/include/c++/8.2.0/bits/std_function.h:682
#10 mongo::(anonymous namespace)::killSessionsAction(mongo::OperationContext *, const mongo::SessionKiller::Matcher &, const std::function<bool(const mongo::ObservableSession&)> &, const std::function<void(mongo::OperationContext*, const mongo::SessionCatalog::SessionToKill&)> &, mongo::ErrorCodes::Error) (opCtx=0x556d7cedf180, matcher=..., filterFn=..., killSessionFn=..., reason=<optimized out>) at src/mongo/db/kill_sessions_local.cpp:80
#11 0x0000556d7333f20c in mongo::yieldLocksForPreparedTransactions (opCtx=<optimized out>) at /opt/mongodbtoolchain/stow/gcc-v3.zr9/include/c++/8.2.0/bits/unique_ptr.h:342
#12 0x0000556d72ecebb0 in mongo::repl::ReplicationCoordinatorImpl::stepDown (this=0x556d786ee680, opCtx=<optimized out>, force=<optimized out>, waitTime=..., stepdownTime=...) at src/mongo/db/repl/replication_coordinator_impl.cpp:2032
#13 0x0000556d72e89e6e in mongo::repl::CmdReplSetStepDown::run (this=<optimized out>, opCtx=0x556d7cf46180, cmdObj=..., result=...) at src/mongo/util/duration.h:227
#14 0x0000556d740c9294 in mongo::BasicCommand::Invocation::run (this=0x556d78600840, opCtx=0x556d7cf46180, result=<optimized out>) at src/mongo/db/commands.cpp:592
#15 0x0000556d72ff7032 in mongo::(anonymous namespace)::runCommandImpl (sessionOptions=..., extraFieldsBuilder=0x7f79170632c0, behaviors=..., startOperationTime=..., replyBuilder=0x556d7cf105d0, request=..., invocation=<optimized out>, opCtx=<optimized out>) at src/mongo/db/service_entry_point_common.cpp:479
#16 mongo::(anonymous namespace)::execCommandDatabase (opCtx=<optimized out>, command=0x556d75822840 <mongo::repl::cmdReplSetStepDown>, request=..., replyBuilder=<optimized out>, behaviors=...) at src/mongo/db/service_entry_point_common.cpp:818
#17 0x0000556d72ff7ebe in mongo::(anonymous namespace)::<lambda()>::operator()(void) const (__closure=0x7f7917063ce0) at /opt/mongodbtoolchain/stow/gcc-v3.zr9/include/c++/8.2.0/bits/unique_ptr.h:342
#18 0x0000556d72ff8790 in mongo::(anonymous namespace)::receivedCommands (behaviors=..., message=..., opCtx=<optimized out>) at src/mongo/db/service_entry_point_common.cpp:905
#19 mongo::ServiceEntryPointCommon::handleRequest (opCtx=0x556d7cf46180, m=..., behaviors=...) at src/mongo/db/service_entry_point_common.cpp:1249
#20 0x0000556d72fe755c in mongo::ServiceEntryPointMongod::handleRequest (this=<optimized out>, opCtx=<optimized out>, m=...) at src/mongo/db/service_entry_point_common.h:59
...



 Comments   
Comment by Githook User [ 17/Apr/19 ]

Author:

{'name': 'Dianna', 'username': 'DiannaHohensee', 'email': 'dianna.hohensee@10gen.com'}

Message: SERVER-40321 Transaction CRUD ops on a shard against a capped collection fail
Branch: master
https://github.com/mongodb/mongo/commit/c539768709f9dbf1cdc59e7dc156c8eab7dad828

Comment by Dianna Hohensee (Inactive) [ 05/Apr/19 ]

Okay, checked with Eric for any perf concerns about doing this on the write path, and there are none. It shall be as requested.

Comment by Judah Schvimer [ 04/Apr/19 ]

alyson.cabral would prefer this check happen as early as possible, and Collection::isCapped already exists and cannot change unlike temp. We also want this to be prohibited for all transactions on shard-servers, not just ones that go through prepare.

Comment by Dianna Hohensee (Inactive) [ 04/Apr/19 ]

geert.bosch, do we want to advocate for checking at prepare time, like we did in SERVER-38139 for temp, or check at write time for capped? Collection::isCapped does appear to check in-memory state, unlike checking for temp, but checking at write time would add two additional checks – one for isShard, one for isCapped – to the write path.

Comment by Judah Schvimer [ 02/Apr/19 ]

After discussing with alyson.cabral, tess.avitabile, and kaloian.manassiev, we will error on the first statement in a mongodb transaction that touches a capped collection on shard-servers only (so any transaction that could be prepared). Replica sets that aren't shard-servers will still be allowed to touch capped collections.

Comment by Judah Schvimer [ 01/Apr/19 ]

If alyson.cabral is ok with it, we'll make the behavior: mongos does nothing, mongod returns an OperationNotSupportedInTransaction error when a statement touches a capped collection, the transaction may implicitly be aborted by the replica set depending on if the implementation requires it. This prohibition will apply to shardservers and replica sets that are not part of a sharded cluster alike. This will make it a slight behavior change from 4.0 single replica set transactions which did allow transactions to touch capped collections.

Comment by Kaloian Manassiev [ 01/Apr/19 ]

It is correct that with 2PC commit optimizations a transaction might not become 2PC until after it has accessed a capped collection (or a collection has become capped).

I am fine with disabling transactions on capped collections and would rather do it not just for sharding, but for replica sets as well.

Comment by Andy Schwerin [ 01/Apr/19 ]

I think it makes sense to fail transactions that touch capped collections. I'd be willing to do it universally, since as Geert points out, the sequential ordering requirement on capped collections makes using them with transactions hazardous. I'd also be willing to just do it in all transactions on sharded clusters – if a shard server detects a transaction accessing a capped collection, it could abort the transaction immediately. Not really worth waiting to see if it might become two-phase, and it's not detectable a priori. You sometimes don't know until commit, or at least until the second write, and that seems like a bad place to report the error.

Comment by Judah Schvimer [ 01/Apr/19 ]

kaloian.manassiev, would the above be possible?

We could alternatively fail any mongodb transaction that tried to touch a capped collection as soon as it attempted to do so, not just cross-shard ones. We could also do this only for single replica set transactions on shardservers to leave non sharded behavior unchanged. If we only want to prevent cross-shard transactions touching capped collections we cannot do that until prepare time on mongod (though could potentially on mongos earlier).

Comment by Alyson Cabral (Inactive) [ 01/Apr/19 ]

Yes, this is ok behavior. Is it possible to make this even more obvious to end users by not allowing them to start a transaction on a capped collection through a mongos? I try to avoid things that could work in dev and break as soon as someone pushes to production, just due to the placement of chunks. I prefer stricter failing if it is ultimately more obvious to catch.

CC:kay.kim

Comment by Judah Schvimer [ 29/Mar/19 ]

This lock yielding was added in SERVER-37199 to release locks on stepdown and reacquire them on stepup. We could investigate if relaxing the invariant for that specific lock restore is safe. Concurrent writers are prevented by secondary oplog application concurrency control. Concurrent readers would be possible, but that may be safe.

a prepared transaction that succeeds on one node may not succeed on another node, however a prepared transaction may not be aborted.

geert.bosch's third point makes it clear to me though that the above invariant relaxation won't be sufficient with the guarantees prepared transactions expect to provide.

I think banning capped collections in prepared transactions similarly to SERVER-38139 for temp collections makes sense. I think schwerin is on board with this given offline discussion. alyson.cabral, is this acceptable behavior?

Comment by Geert Bosch [ 28/Mar/19 ]

I think that cross-shard transactions involving capped collections (other than the oplog) is problematic. There are few reasons:

  • Capped collections enforce sequential insert order. As a result only single operation may be in progress at a time. Currently we use MODE_X locks to enforce the ordering, but prepared transactions do not allow such locks.
  • Prepared transactions must be able to survive replica-set state transitions, but in absence of locking, there is no good way to enforce ordering.
  • Operations on capped collections may implicitly remove documents from a collection, but whether that happens or not may differ between replica-set members. So, a prepared transaction that succeeds on one node may not succeed on another node, however a prepared transaction may not be aborted.

Without a significant redesign of capped collections, we cannot support cross-shard operations on them. I think any effort to make them somewhat work should instead be spent on a workable replacement in the form of improvements to TTL collections.

We should just add an extra check at prepare time to not allow capped collections.

Generated at Thu Feb 08 04:54:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.