[SERVER-37849] Poor replication performance and cache-full hang on secondary due to pinned content Created: 31/Oct/18  Updated: 21/Jan/23

Status: Backlog
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Bruce Lucas (Inactive) Assignee: Backlog - Storage Engines Team
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File hang.png     PNG File hang8.png     File hang8.tar     File repro-10MBx2-push.sh     File repro-10MBx8-push-fast.sh     File repro-10MBx8-push.sh    
Issue Links:
Depends
depends on WT-4421 Add a way to calculate modify operations Closed
depends on WT-4403 Add statistic tracking accumulated di... Closed
Related
is related to SERVER-33191 Cache-full hangs on 3.6 Closed
is related to SERVER-34938 Secondary slowdown or hang due to con... Closed
is related to WT-4422 Don't queue clean pages for urgent ev... Closed
Assigned Teams:
Storage Engines
Operating System: ALL
Participants:
Case:

 Description   

This is a follow-on to SERVER-33191. The repro in that ticket updated large documents by incrementing a field; this repro is similar except that it updates the document by pushing a value onto an array. The symptoms of the latter use case disappeared in 3.6.7, but this use case still has similar bad symptoms in 3.6.8: the secondary takes much longer to apply the updates than the primary, and eventually hangs with the cache full.

  • A-B: secondary member 2 is stopped while some updates are applied
  • B-C: secondary member 2 is restarted and works to catch up but applies the updates much more slowly than the primary and other secondary did due to SERVER-34938.
  • C: secondary is now hung waiting for cache

    #1  0x000055bb44ea3f39 in __wt_cond_wait_signal ()
    #2  0x000055bb44e8cfca in __wt_cache_eviction_worker ()
    #3  0x000055bb44ee553c in __wt_txn_commit ()
    #4  0x000055bb44ecf28b in __session_commit_transaction ()
    #5  0x000055bb44e5ff88 in mongo::WiredTigerRecoveryUnit::_txnClose(bool) ()
    #6  0x000055bb44e6045a in mongo::WiredTigerRecoveryUnit::_commit() ()
    #7  0x000055bb44e1178a in mongo::WriteUnitOfWork::commit() ()
    #8  0x000055bb452dfd76 in mongo::repl::applyOperation_inlock(mongo::OperationContext*, mongo::Database*, mongo::BSONObj const&, bool, mongo::repl::OplogApplication::Mode, std::function<void ()>)::{lambda()#15}::operator()() const ()
    

    including the checkpoint thread

    #0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
    #1  0x000055bb44ea3f39 in __wt_cond_wait_signal ()
    #2  0x000055bb44e8cfca in __wt_cache_eviction_worker ()
    #3  0x000055bb44ecfa68 in __session_begin_transaction ()
    #4  0x000055bb44e65e96 in mongo::WiredTigerSizeStorer::syncCache(bool) ()
    #5  0x000055bb44e450b9 in mongo::WiredTigerKVEngine::syncSizeInfo(bool) const ()
    #6  0x000055bb44e488d5 in mongo::WiredTigerKVEngine::haveDropsQueued() const ()
    #7  0x000055bb44e627f6 in mongo::WiredTigerSessionCache::releaseSession(mongo::WiredTigerSession*) ()
    #8  0x000055bb44e4eb54 in mongo::WiredTigerKVEngine::WiredTigerCheckpointThread::run() ()
    



 Comments   
Comment by Bruce Lucas (Inactive) [ 07/Nov/18 ]

I've attached a new reproducer, repro-10MBx8-push-fast.sh. It removes the numactl that restricted mongod to a single core, and reduces the number of update operations to allow the script to be run multiple times in succession more quickly. The script waits for the updates to be replicated and exits when they have, so that it can be run in a loop until it hangs. Generally I've found that there's a fairly high likelihood of it hanging on any run of the script.

Comment by Bruce Lucas (Inactive) [ 07/Nov/18 ]

I've attached another repro script that updates 8 documents instead of 2. This creates more parallelism on the secondary so may crate more opportunity for hangs if there are race conditions involved.

Here's a hang using this script:

It seems unusual that it is hung with a reported dirty content of essentially 0 (0.026 MB, 1 page). The total (non-dirty) cache content is 327% so maybe that's why the threads are waiting, but why isn't that content being evicted so they can proceed?

FTDC data also attached as hang8.zip

All 8 oplog applier threads (one for each of the 8 documents) are stuck in the same place, waiting for the cache:

Thread 26 (Thread 0x7f5cd157f700 (LWP 1927)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x00005648e3626f39 in __wt_cond_wait_signal ()
#2  0x00005648e360ffca in __wt_cache_eviction_worker ()
#3  0x00005648e366853c in __wt_txn_commit ()
#4  0x00005648e365228b in __session_commit_transaction ()
#5  0x00005648e35e2f88 in mongo::WiredTigerRecoveryUnit::_txnClose(bool) ()
#6  0x00005648e35e345a in mongo::WiredTigerRecoveryUnit::_commit() ()
#7  0x00005648e359478a in mongo::WriteUnitOfWork::commit() ()
#8  0x00005648e3a62d76 in mongo::repl::applyOperation_inlock(mongo::OperationContext*, mongo::Database*, mongo::BSONObj const&, bool, mongo::repl::OplogApplication::Mode, std::function<void ()>)::{lambda()#15}::operator()() const ()
#9  0x00005648e3a6b4d5 in mongo::repl::applyOperation_inlock(mongo::OperationContext*, mongo::Database*, mongo::BSONObj const&, bool, mongo::repl::OplogApplication::Mode, std::function<void ()>) ()
#10 0x00005648e3986919 in std::_Function_handler<mongo::Status (mongo::OperationContext*, mongo::Database*, mongo::BSONObj const&, bool, mongo::repl::OplogApplication::Mode, std::function<void ()>), mong

Generated at Thu Feb 08 04:47:12 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.