[SERVER-60682] TransactionCoordinator may block acquiring WiredTiger write ticket to persist its decision, prolonging transactions being in the prepared state Created: 13/Oct/21  Updated: 07/Nov/23  Resolved: 17/Nov/21

Status: Closed
Project: Core Server
Component/s: Concurrency, Sharding
Affects Version/s: 4.2.0, 4.4.0, 5.0.0, 5.1.0-rc0
Fix Version/s: 5.2.0, 5.1.2, 5.0.6, 4.4.11, 4.2.19

Type: Bug Priority: Critical - P2
Reporter: Max Hirschhorn Assignee: Josef Ahmad
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Related
related to SERVER-65821 Deadlock during setFCV when there are... Closed
related to SERVER-60685 TransactionCoordinator may interrupt ... Closed
related to SERVER-39995 Add concurrency_simultaneous_sharding... Backlog
is related to SERVER-57476 Operation may block on prepare confli... Closed
is related to SERVER-82883 Recovering TransactionCoordinator on ... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v5.1, v5.0, v4.4, v4.2
Sprint: Execution Team 2021-11-29
Participants:
Case:

 Description   

The TransactionCoordinator performs an update to the config.transaction_coordinators on the local shard to write down its commit or abort decision for the cross-shard transaction. During this step of the two-phase commit coordination, the cross-shard transaction is in the prepared state on the participant shards. This means other multi-statement transactions can hit a prepare conflict while waiting for the former cross-shard transaction to commit or abort. These other multi-statement transactions will block while holding storage resources, including a WiredTiger write ticket. It is therefore possible for all WiredTiger write tickets in the system to be temporarily exhausted due to a prepare conflict. It would be less disruptive to the system if the TransactionCoordinator could still write down its decision locally in this situation so that it can more rapidly deliver the decision to the participant shards and clear their prepared state.

Note that after transactionLifetimeLimitSeconds have elapsed (defaults to 1 minute), the multi-statement transactions holding the WiredTiger write tickets will be aborted and will release their ticket and enable the TransactionCoordinator to successfully acquire it.



 Comments   
Comment by Githook User [ 21/Dec/21 ]

Author:

{'name': 'Josef Ahmad', 'email': 'josef.ahmad@mongodb.com', 'username': 'josefahmad'}

Message: SERVER-60682 Exempt transaction coordinators and journal flusher from acquiring storage tickets

Co-authored-by: Louis Williams <louis.williams@mongodb.com>
(cherry picked from commit 1bdff76322b144ef27060fe79324fe3cce4bb17a)
(cherry picked from commit 02d950aee92600ec4613912dc1c84118e8c961c0)
Branch: v4.2
https://github.com/mongodb/mongo/commit/6d42cf673cf50d9ebc298069ec6479358dc689ea

Comment by Githook User [ 10/Dec/21 ]

Author:

{'name': 'Josef Ahmad', 'email': 'josef.ahmad@mongodb.com', 'username': 'josefahmad'}

Message: SERVER-60682 Exempt transaction coordinators and journal flusher from acquiring storage tickets

Co-authored-by: Max Hirschhorn max.hirschhorn@mongodb.com
(cherry picked from commit 1bdff76322b144ef27060fe79324fe3cce4bb17a)
Branch: v4.4
https://github.com/mongodb/mongo/commit/02d950aee92600ec4613912dc1c84118e8c961c0

Comment by Githook User [ 07/Dec/21 ]

Author:

{'name': 'Josef Ahmad', 'email': 'josef.ahmad@mongodb.com', 'username': 'josefahmad'}

Message: SERVER-60682 Exempt transaction coordinators and journal flusher from acquiring storage tickets

Co-authored-by: Max Hirschhorn max.hirschhorn@mongodb.com
(cherry picked from commit 1bdff76322b144ef27060fe79324fe3cce4bb17a)
Branch: v5.0
https://github.com/mongodb/mongo/commit/79599d1ea413cfc331d8b48ac617dec08bdcba0f

Comment by Githook User [ 06/Dec/21 ]

Author:

{'name': 'Josef Ahmad', 'email': 'josef.ahmad@mongodb.com', 'username': 'josefahmad'}

Message: SERVER-60682 Exempt transaction coordinators and journal flusher from acquiring storage tickets

Co-authored-by: Max Hirschhorn max.hirschhorn@mongodb.com
(cherry picked from commit 1bdff76322b144ef27060fe79324fe3cce4bb17a)
Branch: v5.1
https://github.com/mongodb/mongo/commit/0f9e2dc19f68c8db49539bbaf6542c1bde9025a0

Comment by Githook User [ 17/Nov/21 ]

Author:

{'name': 'Josef Ahmad', 'email': 'josef.ahmad@mongodb.com', 'username': 'josefahmad'}

Message: SERVER-60682 Exempt transaction coordinators and journal flusher from acquiring storage tickets

Co-authored-by: Max Hirschhorn max.hirschhorn@mongodb.com
Branch: master
https://github.com/mongodb/mongo/commit/1bdff76322b144ef27060fe79324fe3cce4bb17a

Comment by Max Hirschhorn [ 14/Oct/21 ]

It looks like it does need to go back to the Storage Execution team because waitForMajorityWithHangFailpoint() after writing the decision also ends up blocking due to the JournalFlusher attempting to acquire a WiredTiger write ticket.

Thread 18 (Thread 0x7f2820bf8700 (LWP 18343)):
#0  0x00007f2835f439b2 in do_futex_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00007f2835f43ac3 in __new_sem_wait_slow () from /lib/x86_64-linux-gnu/libpthread.so.0
#2  0x00007f2831169e9d in mongo::TicketHolder::waitForTicketUntil (this=0x7f282f8c4800 <mongo::(anonymous namespace)::openWriteTransaction>, opCtx=0x560eab665000, until=...) at src/mongo/util/concurrency/ticketholder.cpp:112
#3  0x00007f28311a718b in mongo::LockerImpl::_acquireTicket (this=0x560eab640080, opCtx=<optimized out>, mode=<optimized out>, deadline=...) at src/mongo/db/concurrency/lock_state.cpp:389
#4  0x00007f28311aabb7 in mongo::LockerImpl::lockGlobal (this=0x560eab640080, opCtx=0x560eab665000, mode=mongo::MODE_IX, deadline=...) at src/mongo/db/concurrency/lock_state.cpp:409
#5  0x00007f283119926f in mongo::Lock::GlobalLock::_takeGlobalAndRSTLLocks (this=0x7f2820bf68a8, lockMode=mongo::MODE_IX, deadline=...) at /opt/mongodbtoolchain/revisions/ba5f698948588cb5da922d3cadee990f5f9f48cd/stow/gcc-v3.pPo/include/c++/8.5.0/bits/unique_ptr.h:345
#6  0x00007f283119965e in mongo::Lock::GlobalLock::GlobalLock (this=0x7f2820bf68a8, opCtx=0x560eab665000, lockMode=mongo::MODE_IX, deadline=..., behavior=<optimized out>, skipRSTLLock=<optimized out>) at src/mongo/db/concurrency/d_concurrency.cpp:167
#7  0x00007f2831199848 in mongo::Lock::DBLock::DBLock (this=0x7f2820bf6890, opCtx=<optimized out>, db="local", mode=mongo::MODE_IX, deadline=...) at src/mongo/db/concurrency/lock_manager_defs.h:106
#8  0x00007f2831ffcc70 in mongo::AutoGetDb::AutoGetDb (this=0x7f2820bf6868, opCtx=0x560eab665000, dbName="local", mode=mongo::MODE_IX, deadline=...) at /opt/mongodbtoolchain/revisions/ba5f698948588cb5da922d3cadee990f5f9f48cd/stow/gcc-v3.pPo/include/c++/8.5.0/ext/new_allocator.h:86
#9  0x00007f2831ffd105 in mongo::AutoGetCollection::AutoGetCollection (this=0x7f2820bf6860, opCtx=0x560eab665000, nsOrUUID=..., modeColl=mongo::MODE_IX, viewMode=mongo::AutoGetCollectionViewMode::kViewsForbidden, deadline=...) at src/mongo/base/string_data.h:66
#10 0x00007f2833ccec99 in mongo::repl::ReplicationConsistencyMarkersImpl::refreshOplogTruncateAfterPointIfPrimary (this=0x560ea6880180, opCtx=0x560eab665000) at /opt/mongodbtoolchain/revisions/ba5f698948588cb5da922d3cadee990f5f9f48cd/stow/gcc-v3.pPo/include/c++/8.5.0/bits/char_traits.h:287
#11 0x00007f28345e43cd in mongo::repl::ReplicationCoordinatorExternalStateImpl::getToken (this=0x560ea68822c0, opCtx=0x560eab665000) at src/mongo/db/repl/replication_coordinator_external_state_impl.cpp:1112
#12 0x00007f282f886d05 in mongo::WiredTigerSessionCache::waitUntilDurable (this=0x560ea6874c00, opCtx=opCtx@entry=0x560eab665000, syncType=syncType@entry=mongo::WiredTigerSessionCache::Fsync::kJournal, useListener=useListener@entry=mongo::WiredTigerSessionCache::UseJournalListener::kUpdate) at src/mongo/db/storage/wiredtiger/wiredtiger_session_cache.cpp:327
#13 0x00007f282f87dd93 in mongo::WiredTigerRecoveryUnit::waitUntilDurable (this=0x560eab66c000, opCtx=0x560eab665000) at src/mongo/db/storage/wiredtiger/wiredtiger_recovery_unit.cpp:275
#14 0x00007f28354a9e21 in mongo::JournalFlusher::run (this=<optimized out>) at src/third_party/boost/boost/optional/detail/optional_aligned_storage.hpp:53


diff --git a/src/mongo/db/s/transaction_coordinator_util.cpp b/src/mongo/db/s/transaction_coordinator_util.cpp
index bb6832314b..045264c870 100644
--- a/src/mongo/db/s/transaction_coordinator_util.cpp
+++ b/src/mongo/db/s/transaction_coordinator_util.cpp
@@ -36,6 +36,7 @@
 #include "mongo/client/remote_command_retry_scheduler.h"
 #include "mongo/db/commands/txn_cmds_gen.h"
 #include "mongo/db/commands/txn_two_phase_commit_cmds_gen.h"
+#include "mongo/db/concurrency/lock_state.h"
 #include "mongo/db/curop.h"
 #include "mongo/db/dbdirectclient.h"
 #include "mongo/db/internal_transactions_feature_flag_gen.h"
@@ -437,6 +438,7 @@ Future<repl::OpTime> persistDecision(txn::AsyncWorkScheduler& scheduler,
             return scheduler.scheduleWork(
                 [lsid, txnNumberAndRetryCounter, participants, decision](OperationContext* opCtx) {
                     FlowControl::Bypass flowControlBypass(opCtx);
+                    SkipTicketAcquisitionForLock skipTicketAcquisition(opCtx);
                     getTransactionCoordinatorWorkerCurOpRepository()->set(
                         opCtx,
                         lsid,

Comment by Max Hirschhorn [ 13/Oct/21 ]

Sending this over to the Storage Execution team because I felt they are best positioned to facilitate having the update (currently run via DBDirectClient) call Locker::skipAcquireTicket().

Edit: Actually it looks like SkipTicketAcquisitionForLock can be used to do this already.

Generated at Thu Feb 08 05:50:27 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.