[SERVER-51281] mongod live locked Created: 01/Oct/20  Updated: 06/Dec/23  Resolved: 12/Feb/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 4.9.0, 4.4.5

Type: Bug Priority: Major - P3
Reporter: Mark Weaver Assignee: Lingzhi Deng
Resolution: Fixed Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Screenshot 2021-02-05 at 1.13.57 PM.png     Zip Archive diagnostic.data.zip     Text File gdb.log     PNG File txn.png    
Issue Links:
Backports
Depends
Related
related to SERVER-50365 Stuck with long-running transactions ... Closed
is related to WT-7056 Fix stuck cache eviction worker Closed
is related to SERVER-54534 No need to wait for writeConcern on a... Closed
is related to WT-7926 Investigate potential WT cache evicti... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.4
Sprint: Storage - Ra 2021-01-25, Repl 2021-02-22
Participants:
Case:

 Description   

We are encountering an issue with mongo-4.4.1 (4.4.0 also exhibited the issue) running under kubernetes using: https://hub.docker.com/layers/mongo/library/mongo/4.4/images/sha256-31f6433f7cfcd2180483e40728cbf97142df1e85de36d80d75c93e5e7fe10405?context=explore

The problem is that occasionally the server will spin the CPU, and pile up connections running queries that never finish. It appears that most of them are waiting for a lock that is never unlocked, and they are stuck in that state – the actual TCP connections to the server get shut down by the clients, but server does not decrement the connection count. We can't reproduce this in a reliable fashion – it happens every few days.

The previous version of mongo that we were using was 4.0.5 and we no problems there.

If I can collect any other information please let me know (I have a core dump that I generated the backtraces from, but I can't attach that in case it contains user data).

Thanks,

Mark



 Comments   
Comment by Githook User [ 22/Feb/21 ]

Author:

{'name': 'Lingzhi Deng', 'email': 'lingzhi.deng@mongodb.com', 'username': 'ldennis'}

Message: SERVER-51281: Skip LastOpFixer for multi-doc transactions

(cherry picked from commit d7b4c6bcaa61dc54d830007657babdd6337f9cfb)
Branch: v4.4
https://github.com/mongodb/mongo/commit/10baed74527acfce053fa17ce47b643d47757dd4

Comment by Githook User [ 10/Feb/21 ]

Author:

{'name': 'Lingzhi Deng', 'email': 'lingzhi.deng@mongodb.com', 'username': 'ldennis'}

Message: SERVER-51281: Skip LastOpFixer for multi-doc transactions
Branch: master
https://github.com/mongodb/mongo/commit/d7b4c6bcaa61dc54d830007657babdd6337f9cfb

Comment by Bruce Lucas (Inactive) [ 07/Jan/21 ]

Is this the same underlying issue as WT-7056?

Comment by Louis Williams [ 08/Oct/20 ]

bruce.lucas, my understanding is that those WT tickets will allow more data to be evictable overall. It's not clear to me how, or to what extent the dirty content is allowed to expand beyond the default threshold. Is dirty content allowed to expand beyond the default threshold as a high water mark or is it a hard limit? Or is the dirty content limit still bounded by the cache size? All of the threads waiting for eviction seem to think cache eviction will allow them to make forward progress.

This is what I see as the problem: Assume there is a hard limit on dirty content (whether it be the configured limit or the WT cache size). In this example, every operation has uncommitted data taking up cache space. All are trying to commit, but because of a lack of cache space, they are tasked with eviction. No operations are able to evict because none can commit, which would make their data evictable (and in no case is uncommitted data evictable). This example has an obvious problem and that is operations with uncommitted data fill up the cache by waiting for eviction as they commit, further preventing other operations from committing.

I know there is logic already in WT to roll back operations when no progress is being made for eviction. Why didn't that kick in for this scenario?

Comment by Bruce Lucas (Inactive) [ 07/Oct/20 ]

louis.williams, I'm not seeing how those WT tickets will help: isn't it the case that the cache content is not evictable because it is pinned by the the open transaction, and that open transaction is unable to proceed because there is no space in the cache to hold its next write? In short, that single transaction wants to generate more dirty content than the cache dirty content limit allows, and it's fairly fundamental to the design of WT that that is not possible? If so then I don't see how those WT tickets would help, but I may be missing something.

However I do feel that it could be considered a flaw that we have no way to abort transactions that get stuck in this way, but instead we allow the whole instance to become wedged permanently.

Comment by Louis Williams [ 07/Oct/20 ]

To summarize what is going on here:

  • The small cache means that the threshold for dirty content is much lower. Due to certain deficiencies (i.e. WT-6490), cache eviction can get stuck when there is too much dirty content and none of it is evictable. There are other tickets, WT-6624 and WT-6707 that aim to further improve this behavior.
  • The "PeriodicThreadToAbortExpiredTransactions" is waiting for a specific session to be checked back in so that it can kill it. There appears to also be a thread (below) that is stuck waiting for cache with a session checked out. Without all the details, the most likely explanation here is that the operation with the checked-out session is stuck and is not killable. This in turn blocks the transaction killer.

    Thread 131 (Thread 0x7fee12406700 (LWP 22488)):
    #0  0x00007fee3045ded9 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
    #1  0x00005595d19391b9 in __wt_cond_wait_signal ()
    #2  0x00005595d190aabd in __wt_cache_eviction_worker ()
    #3  0x00005595d19a168d in __wt_btcur_prev ()
    #4  0x00005595d18f4546 in __curfile_prev ()
    #5  0x00005595d18af4f1 in mongo::WiredTigerRecordStore::getLatestOplogTimestamp(mongo::OperationContext*) const ()
    #6  0x00005595d1b388f8 in mongo::repl::ReplicationCoordinatorImpl::getLatestWriteOpTime(mongo::OperationContext*) const ()
    #7  0x00005595d321f9c6 in mongo::repl::ReplClientInfo::setLastOpToSystemLastOpTime(mongo::OperationContext*) ()
    #8  0x00005595d321fd6c in mongo::repl::ReplClientInfo::setLastOpToSystemLastOpTimeIgnoringInterrupt(mongo::OperationContext*) ()
    #9  0x00005595d1ffa327 in mongo::(anonymous namespace)::LastOpFixer::~LastOpFixer() [clone .part.1301] ()
    #10 0x00005595d1ffbe95 in mongo::performUpdates(mongo::OperationContext*, mongo::write_ops::Update const&) ()
    #11 0x00005595d1f5f766 in mongo::(anonymous namespace)::CmdUpdate::Invocation::runImpl(mongo::OperationContext*, mongo::BSONObjBuilder&) const ()
    #12 0x00005595d1f5d2ad in mongo::(anonymous namespace)::WriteCommand::InvocationBase::run(mongo::OperationContext*, mongo::rpc::ReplyBuilderInterface*) ()
    #13 0x00005595d24bdfbf in mongo::CommandHelpers::runCommandInvocation(mongo::OperationContext*, mongo::OpMsgRequest const&, mongo::CommandInvocation*, mongo::rpc::ReplyBuilderInterface*) ()
    #14 0x00005595d1c76dd1 in mongo::(anonymous namespace)::invokeWithSessionCheckedOut(mongo::OperationContext*, mongo::OpMsgRequest const&, mongo::CommandInvocation*, mongo::OperationSessionInfoFromClient const&, mongo::rpc::ReplyBuilderInterface*) ()
    

  • The "LogicalSessionCacheImpl" reaper is deleting already-idle/expired sessions from the config.transactions table. I believe this reaper is just caught-up in the cache eviction problem. Deleting an already-idle session from the config.transactions table will not relieve cache pressure in the same way as the transaction reaper.

I'm not sure if there's a specific bug with either the transaction killer or the session reaper. Once an operation is inside the storage engine, there is little to nothing that the MongoDB code can do to kill it.

Aside from the linked WT improvements, the only solution I can think of on our end would be to do a better job of throttling access into the storage engine. bruce.lucas, do you agree?

Comment by Eric Milkie [ 06/Oct/20 ]

Eric Milkie, Louis Williams, question: suppose an operation that is part of a transaction is stuck waiting for cache; will the expired transaction killer be able to abort the transaction? The stack traces above show the expired transaction killer is stuck in checkOutSessionForKill; could be because there is an operation in progress on that session that is stuck waiting for cache?
Bruce, your intuition sounds correct to me; I believe the answer is: the expired transaction killer can only abort inactive (no operation in progress) multi-document transactions. If any operation gets stuck in the storage engine, the expired transaction killer won't be able to kill such operations.

Comment by Bruce Lucas (Inactive) [ 06/Oct/20 ]

Thanks Mark, appreciate the detailed information.

Comment by Mark Weaver [ 06/Oct/20 ]

For gdb.log, those are two separate stack traces from two separate incidents – I uploaded the 2nd incident so that you could have matching gdb, mongo log, diagnostic data and currentOps output. For timestamps, I have 2020-10-01T‏‎15:45:08Z for the first incident and 2020-10-02T‏‎05:50:20Z for the second incident. If it helps, I still have the core dumps so I can further interrogate those (but I very likely can't upload them as they contain 3rd party data – I'd have to get permission from the 3rd party to share that).

Comment by Bruce Lucas (Inactive) [ 06/Oct/20 ]

milkie, louis.williams, question: suppose an operation that is part of a transaction is stuck waiting for cache; will the expired transaction killer be able to abort the transaction? The stack traces above show the expired transaction killer is stuck in checkOutSessionForKill; could be because there is an operation in progress on that session that is stuck waiting for cache?

Comment by Bruce Lucas (Inactive) [ 05/Oct/20 ]

Good point, mark.weaver@trelica.com. I've been looking into this further and I'm not sure this is really entirely a storage engine issue. The diagnostic data that you uploaded shows two incidents, and in one case there was a batch of inserts preceding the incident, but in the second case there was not.

In both cases though we see something like this:

At A a transaction is started ("total started") but never commits ("total committed") and remains open ("current open"). We would expect such a transaction to be times out after 60 seconds (at default settings, which you appear to be using). This is the thread that is responsible for killing expired transactions, and it appears to be trying to do so, but it appears to be stuck waiting to check out a session:

Thread 34 (Thread 0x7f9586b4d700 (LWP 82)):
#0  0x00007f959b18ccf9 in poll () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00005578da34c507 in mongo::transport::TransportLayerASIO::BatonASIO::run(mongo::ClockSource*) ()
#2  0x00005578da81e16e in mongo::OperationContext::waitForConditionOrInterruptNoAssertUntil(mongo::stdx::condition_variable&, mongo::BasicLockableAdapter, mongo::Date_t) ()
#3  0x00005578d95d9e61 in mongo::SessionCatalog::checkOutSessionForKill(mongo::OperationContext*, mongo::SessionCatalog::KillToken) ()
#4  0x00005578d938ab3a in mongo::(anonymous namespace)::killSessionsAction(mongo::OperationContext*, mongo::SessionKiller::Matcher const&, std::function<bool (mongo::ObservableSession const&)> const&, std::function<void (mongo::OperationContext*, mongo::SessionCatalog::SessionToKill const&)> const&, mongo::ErrorCodes::Error) ()
#5  0x00005578d938b644 in mongo::killAllExpiredTransactions(mongo::OperationContext*) ()
#6  0x00005578d90933a5 in std::_Function_handler<void (mongo::Client*), mongo::PeriodicThreadToAbortExpiredTransactions::_init(mongo::ServiceContext*)::{lambda(mongo::Client*)#1}>::_M_invoke(std::_Any_data const&, mongo::Client*&&) ()
#7  0x00005578d9225136 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<mongo::stdx::thread::thread<mongo::PeriodicRunnerImpl::PeriodicJobImpl::_run()::{lambda()#2}, , 0>(mongo::PeriodicRunnerImpl::PeriodicJobImpl::_run()::{lambda()#2})::{lambda()#1}> > >::_M_run() ()
#8  0x00005578dab019df in execute_native_thread_routine ()
#9  0x00007f959b4706db in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#10 0x00007f959b199a3f in clone () from /lib/x86_64-linux-gnu/libc.so.6

I'm not sure what it is stuck waiting for, but we will look into it further.

Can you clarify what is in gdb.log - it appears to be two different stack trace samples, but I'm not sure if it's two consecutive samples from the same incident, or two samples from two different incidents. Do you have the timestamps of the samples to help clarify?

Comment by Mark Weaver [ 03/Oct/20 ]

Yes, I can certainly give it a go with 4.4.2 (and knowing what it is I can try using larger transactions to see if that makes the behaviour more reproducible). It certainly fits with what we observe (it usually happens early am when some batch jobs run).

There is still a bug in there somewhere though because mongo continues to spin the cpu even after every client has been disconnected – if it was just a cache limit I'd expect aborting all existing connections to free things up?

Thanks,
Mark

Comment by Bruce Lucas (Inactive) [ 02/Oct/20 ]

Thanks Mark.

You have a fairly small cache at ~250MB cache size, which only allows up to 50MB of dirty content in the cache. WiredTiger is able to evict all committed content from the cache, but can't evict uncommitted content, so it is relatively easy to reach the dirty limit before the content has been committed. The stall happens immediately following a cluster of inserts, which generates enough dirty content that the cache becomes stuck full with that dirty content.

Since you were able to run your workload with 4.0, then we believe that WT-6490 should help in 4.4.2. Would you be able to test with 4.4.2 when it becomes available and let us know the outcome?

Also additional future work in WT-6624 and WT-6707 should further improve the behavior.

Comment by Mark Weaver [ 02/Oct/20 ]

@Bruce Lucas: I have put some log files in the s3 bucket using the link you provided above from another instance of the same problem (the output of db.currentOp() which shows a bunch of threads waiting for latches; the output of gdb which shows roughly the same; the mongod logs and diagnostic.data for the relevant period). I also have a core dump locally – it's doubtful I can share that but I can interrogate it for you if that's helpful.
Thanks,
Mark

Comment by Mark Weaver [ 01/Oct/20 ]

I've attached the diagnostics again – the last upload failed, so I've cut it down a bit as I guess there's a size limit.
For mongodb.log I don't actually have it directly – the data was imported into an elasticsearch instance so if there's anything you are particularly interested in, I should be able to extract it (but it will be as json or whatever format elastic spits out so let me know if you want that done)
Thanks!

Comment by Bruce Lucas (Inactive) [ 01/Oct/20 ]

mark.weaver@trelica.com, thanks for the report. Can you please archive and attach to this ticket the $dbpath/diagnostic.data directory? It will also be helpful to have the timestamps of the incident(s).

Also, can you attach the mongodb log? If it is too large or you don't want to post it on a public ticket you can upload it to this secure private portal.

Comment by Mark Weaver [ 01/Oct/20 ]

(Sorry about the mess, I accidentally hit enter so I've ended up with a half formed bug I can't see how to edit.)

We are encountering an issue with mongo-4.4.1 (4.4.0 also exhibited the issue) running under kubernetes using: https://hub.docker.com/layers/mongo/library/mongo/4.4/images/sha256-31f6433f7cfcd2180483e40728cbf97142df1e85de36d80d75c93e5e7fe10405?context=explore

The problem is that occasionally the server will spin the CPU, and pile up connections running queries that never finish. It appears that most of them are waiting for a lock that is never unlocked, and they are stuck in that state – the actual TCP connections to the server get shut down by the clients, but server does not decrement the connection count. We can't reproduce this in a reliable fashion – it happens every few days.

The previous version of mongo that we were using was 4.0.5 and we no problems there.

If I can collect any other information please let me know (I have a core dump that I generated the backtraces from, but I can't attach that in case it contains user data).

Thanks,

Mark

Generated at Thu Feb 08 05:25:01 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.