[SERVER-35770] Running a multi-statement transaction when all WiredTiger write tickets are exhausted may lead to deadlock Created: 25/Jun/18  Updated: 29/Oct/23  Resolved: 08/Aug/18

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 4.0.0
Fix Version/s: 4.0.2, 4.1.2

Type: Bug Priority: Critical - P2
Reporter: Max Hirschhorn Assignee: Matthew Russotto
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
related to SERVER-43868 Session::TxnResources::release() can ... Closed
related to SERVER-39320 lower wiredTigerConcurrentWriteTransa... Closed
is related to SERVER-35217 killSessions command attempts to kill... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.0
Sprint: Repl 2018-07-16, Repl 2018-07-30, Repl 2018-08-13
Participants:
Linked BF Score: 62

 Description   

Unstashing a transaction's lock state also involves reacquiring a WiredTiger write ticket. The call to Locker::reacquireTicket() has no lock timeout and occurs while holding the Session::_mutex. Since the killAllExpiredTransactions() function calls Session::abortArbitraryTransactionIfExpired() which attempts to acquire the Session::_mutex, it isn't possible to reap expired transactions while a transaction's lock state is being unstashed and no WiredTiger write tickets are available.

Thread 36: "conn1" (Thread 0x7f8e579ef700 (LWP 14971))
#0  sem_timedwait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_timedwait.S:101
#1  0x00007f8e595b861e in mongo::TicketHolder::waitForTicketUntil (this=<optimized out>, opCtx=<optimized out>, until=...) at src/mongo/util/concurrency/ticketholder.cpp:104
#2  0x00007f8e595b8673 in mongo::TicketHolder::waitForTicket (this=<optimized out>, opCtx=<optimized out>) at src/mongo/util/concurrency/ticketholder.cpp:90
#3  0x00007f8e5951eacd in mongo::LockerImpl<false>::_acquireTicket (this=<optimized out>, opCtx=<optimized out>, mode=<optimized out>, deadline=...) at src/mongo/db/concurrency/lock_state.cpp:338
#4  0x00007f8e5951eb1d in mongo::LockerImpl<false>::reacquireTicket (this=<optimized out>, opCtx=<optimized out>) at src/mongo/db/concurrency/lock_state.cpp:322
#5  0x00007f8e592fd06f in mongo::Session::TxnResources::release (this=<optimized out>, opCtx=<optimized out>) at src/mongo/db/session.cpp:658
#6  0x00007f8e593021e3 in mongo::Session::unstashTransactionResources (this=<optimized out>, opCtx=<optimized out>, cmdName=...) at src/mongo/db/session.cpp:775
#7  0x00007f8e584adc80 in mongo::(anonymous namespace)::invokeInTransaction (opCtx=<optimized out>, invocation=<optimized out>, replyBuilder=<optimized out>) at src/mongo/db/service_entry_point_common.cpp:466
#8  0x00007f8e584b0dd9 in runCommandImpl (sessionOptions=..., extraFieldsBuilder=<optimized out>, behaviors=..., startOperationTime=..., replyBuilder=<optimized out>, request=..., invocation=<optimized out>, opCtx=<optimized out>) at src/mongo/db/service_entry_point_common.cpp:537
#9  mongo::(anonymous namespace)::execCommandDatabase (opCtx=<optimized out>, command=<optimized out>, request=..., replyBuilder=<optimized out>, behaviors=...) at src/mongo/db/service_entry_point_common.cpp:873
#10 0x00007f8e584b2291 in mongo::(anonymous namespace)::<lambda()>::operator()(void) const (__closure=<optimized out>) at src/mongo/db/service_entry_point_common.cpp:1018
#11 0x00007f8e584b2291 in mongo::ServiceEntryPointCommon::handleRequest (opCtx=<optimized out>, m=..., behaviors=...)
#12 0x00007f8e584b31d1 in runCommands (behaviors=..., message=..., opCtx=<optimized out>) at src/mongo/db/service_entry_point_common.cpp:1033
#13 mongo::ServiceEntryPointCommon::handleRequest (opCtx=<optimized out>, m=..., behaviors=...) at src/mongo/db/service_entry_point_common.cpp:1307
#14 0x00007f8e584a09da in mongo::ServiceEntryPointMongod::handleRequest (this=<optimized out>, opCtx=<optimized out>, m=...) at src/mongo/db/service_entry_point_mongod.cpp:123
#15 0x00007f8e584ab76a in mongo::ServiceStateMachine::_processMessage (this=<optimized out>, guard=...) at src/mongo/transport/service_state_machine.cpp:378
...
#36 mongo::(anonymous namespace)::runFunc (ctx=<optimized out>) at src/mongo/transport/service_entry_point_utils.cpp:55
#37 0x00007f8e56844184 in start_thread (arg=0x7f8e579ef700) at pthread_create.c:312
#38 0x00007f8e5657103d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 39: "startPe.actions" (Thread 0x7f8e4551d700 (LWP 14968))
#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007f8e56846649 in _L_lock_909 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2  0x00007f8e56846470 in __GI___pthread_mutex_lock (mutex=0x7f8e5f6f8080) at ../nptl/pthread_mutex_lock.c:79
#3  0x00007f8e592fdcc7 in __gthread_mutex_lock (__mutex=<optimized out>) at /opt/mongodbtoolchain/v2/include/c++/5.4.0/x86_64-mongodb-linux/bits/gthr-default.h:748
#4  lock (this=<optimized out>) at /opt/mongodbtoolchain/v2/include/c++/5.4.0/mutex:135
#5  lock_guard (__m=..., this=<optimized out>) at /opt/mongodbtoolchain/v2/include/c++/5.4.0/mutex:386
#6  mongo::Session::abortArbitraryTransactionIfExpired (this=<optimized out>) at src/mongo/db/session.cpp:858
#7  0x00007f8e58a90d7b in mongo::<lambda(mongo::OperationContext*, mongo::Session*)>::operator()(mongo::Session *, mongo::OperationContext *) (session=<optimized out>, opCtx=<optimized out>, __closure=<optimized out>) at src/mongo/db/kill_sessions_local.cpp:76
#8  0x00007f8e59306504 in operator() (__args#1=<optimized out>, __args#0=<optimized out>, this=<optimized out>) at /opt/mongodbtoolchain/v2/include/c++/5.4.0/functional:2267
#9  mongo::SessionCatalog::scanSessions(mongo::OperationContext*, mongo::SessionKiller::Matcher const&, std::function<void (mongo::OperationContext*, mongo::Session*)>) (this=<optimized out>, opCtx=<optimized out>, matcher=..., workerFn=...) at src/mongo/db/session_catalog.cpp:208
#10 0x00007f8e58a911b8 in mongo::killAllExpiredTransactions (opCtx=<optimized out>) at src/mongo/db/kill_sessions_local.cpp:96
#11 0x00007f8e5881041f in operator() (__closure=<optimized out>, client=<optimized out>) at src/mongo/db/periodic_runner_job_abort_expired_transactions.cpp:91
#12 std::_Function_handler<void(mongo::Client*), mongo::startPeriodicThreadToAbortExpiredTransactions(mongo::ServiceContext*)::<lambda(mongo::Client*)> >::_M_invoke(const std::_Any_data &, <unknown type in /home/maxh/debugging/mongo/mongod, CU 0x0, DIE 0x316c7>) (__functor=..., __args#0=<optimized out>) at /opt/mongodbtoolchain/v2/include/c++/5.4.0/functional:1871
#13 0x00007f8e5923d47c in operator() (__args#0=<optimized out>, this=<optimized out>) at /opt/mongodbtoolchain/v2/include/c++/5.4.0/functional:2267
#14 operator() (__closure=<optimized out>) at src/mongo/util/periodic_runner_impl.cpp:108
#15 _M_invoke<> (this=<optimized out>) at /opt/mongodbtoolchain/v2/include/c++/5.4.0/functional:1531
#16 operator() (this=<optimized out>) at /opt/mongodbtoolchain/v2/include/c++/5.4.0/functional:1520
#17 std::thread::_Impl<std::_Bind_simple<mongo::PeriodicRunnerImpl::PeriodicJobImpl::run()::<lambda()>()> >::_M_run(void) (this=<optimized out>) at /opt/mongodbtoolchain/v2/include/c++/5.4.0/thread:115
#18 0x00007f8e59e680d0 in std::execute_native_thread_routine (__p=<optimized out>) at ../../../../../gcc-5.4.0/libstdc++-v3/src/c++11/thread.cc:84
#19 0x00007f8e56844184 in start_thread (arg=0x7f8e4551d700) at pthread_create.c:312
#20 0x00007f8e5657103d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111



 Comments   
Comment by Githook User [ 24/Aug/18 ]

Author:

{'name': 'Matthew Russotto', 'email': 'matthew.russotto@10gen.com', 'username': 'mtrussotto'}

Message: SERVER-35770 Running a multi-statement transaction when all WiredTiger write tickets are exhausted may lead to deadlock

(cherry picked from commit 210bb5d91cb3c77bb3ed169114f8b85cd1062fb3)
Branch: v4.0
https://github.com/mongodb/mongo/commit/ecf9c1bd4cc1b1ed8a1606cfc5c0480eaf7f7020

Comment by Githook User [ 08/Aug/18 ]

Author:

{'name': 'Matthew Russotto', 'email': 'matthew.russotto@10gen.com', 'username': 'mtrussotto'}

Message: SERVER-35770 Running a multi-statement transaction when all WiredTiger write tickets are exhausted may lead to deadlock
Branch: master
https://github.com/mongodb/mongo/commit/210bb5d91cb3c77bb3ed169114f8b85cd1062fb3

Comment by Eric Milkie [ 25/Jun/18 ]

I think changing Locker::reacquireTicket() to use the operation lock timeout time as its timeout will be a fine solution to this problem.

Generated at Thu Feb 08 04:40:54 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.