Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-38735

Extended stalls under cache pressure

    • v4.2, v4.0
    • Storage NYC 2018-10-08, Storage NYC 2018-10-22, Storage NYC 2018-11-05, Storage Engines 2019-06-17
    • 5

      Attached script (taken from another ticket) creates cache pressure on the primary by lagging the secondary. This isn't an ideal condition for the storage engine, but it appears that 3.6 did significantly better on this workload than 4.0:

      • under 3.6 the same workload completed significantly faster
      • under 4.0 there are extended total stalls of up to 80s, whereas worst under 3.6 in these tests was about 8s.
      • it appears that 4.0.2 may be a little worse than 4.0.0, although the erratic performance makes the data noisy so some much longer runs would probably be needed to see if there is a real difference

      FTDC data and 9 stack samples taken during one of the stalls attached. Here are a couple of the top stacks with counts; in each of the 9 samples one of the application threads doing i/o while evicting data and other threads are waiting.

       
      109 pthread_cond_timedwait@@GLIBC_2.3.2:238;__wt_cond_wait_signal;__wt_cache_eviction_worker;__wt_txn_commit;__session_commit_transaction;mongo::WiredTigerRecoveryUnit::_txnClose;mongo::WiredTigerRecoveryUnit::_commit;mongo::WriteUnitOfWork::commit;mongo::(anonymous namespace)::insertDocuments;mongo::performInserts;mongo::(anonymous namespace)::CmdInsert::Invocation::runImpl;mongo::(anonymous namespace)::WriteCommand::InvocationBase::run;mongo::(anonymous namespace)::invokeInTransaction;mongo::(anonymous namespace)::execCommandDatabase;mongo::(anonymous namespace)::receivedCommands;mongo::ServiceEntryPointCommon::handleRequest;mongo::ServiceEntryPointMongod::handleRequest;mongo::ServiceStateMachine::_processMessage;mongo::ServiceStateMachine::_runNextInGuard;std::_Function_handler<...>;mongo::transport::ServiceExecutorSynchronous::schedule;mongo::ServiceStateMachine::_scheduleNextWithGuard;mongo::ServiceStateMachine::_sourceCallback;mongo::ServiceStateMachine::_sourceMessage;mongo::ServiceStateMachine::_runNextInGuard;std::_Function_handler<...>;std::_Function_handler<...>;mongo::(anonymous namespace)::runFunc;start_thread:333;clone:109
        15 __lll_lock_wait:135;__GI___pthread_mutex_lock:80;__split_internal_lock;__wt_split_multi;__wt_evict;__evict_page;__wt_cache_eviction_worker;__wt_txn_commit;__session_commit_transaction;mongo::WiredTigerRecoveryUnit::_txnClose;mongo::WiredTigerRecoveryUnit::_commit;mongo::WriteUnitOfWork::commit;mongo::(anonymous namespace)::insertDocuments;mongo::performInserts;mongo::(anonymous namespace)::CmdInsert::Invocation::runImpl;mongo::(anonymous namespace)::WriteCommand::InvocationBase::run;mongo::(anonymous namespace)::invokeInTransaction;mongo::(anonymous namespace)::execCommandDatabase;mongo::(anonymous namespace)::receivedCommands;mongo::ServiceEntryPointCommon::handleRequest;mongo::ServiceEntryPointMongod::handleRequest;mongo::ServiceStateMachine::_processMessage;mongo::ServiceStateMachine::_runNextInGuard;std::_Function_handler<...>;mongo::transport::ServiceExecutorSynchronous::schedule;mongo::ServiceStateMachine::_scheduleNextWithGuard;mongo::ServiceStateMachine::_sourceCallback;mongo::ServiceStateMachine::_sourceMessage;mongo::ServiceStateMachine::_runNextInGuard;std::_Function_handler<...>;std::_Function_handler<...>;mongo::(anonymous namespace)::runFunc;start_thread:333;clone:109
         9 pwrite64:81;__posix_file_write;__block_write_off;__wt_block_write;__wt_bt_write;__rec_split_write;__wt_reconcile;__wt_evict;__evict_page;__wt_cache_eviction_worker;__wt_txn_commit;__session_commit_transaction;mongo::WiredTigerRecoveryUnit::_txnClose;mongo::WiredTigerRecoveryUnit::_commit;mongo::WriteUnitOfWork::commit;mongo::(anonymous namespace)::insertDocuments;mongo::performInserts;mongo::(anonymous namespace)::CmdInsert::Invocation::runImpl;mongo::(anonymous namespace)::WriteCommand::InvocationBase::run;mongo::(anonymous namespace)::invokeInTransaction;mongo::(anonymous namespace)::execCommandDatabase;mongo::(anonymous namespace)::receivedCommands;mongo::ServiceEntryPointCommon::handleRequest;mongo::ServiceEntryPointMongod::handleRequest;mongo::ServiceStateMachine::_processMessage;mongo::ServiceStateMachine::_runNextInGuard;std::_Function_handler<...>;mongo::transport::ServiceExecutorSynchronous::schedule;mongo::ServiceStateMachine::_scheduleNextWithGuard;mongo::ServiceStateMachine::_sourceCallback;mongo::ServiceStateMachine::_sourceMessage;mongo::ServiceStateMachine::_runNextInGuard;std::_Function_handler<...>;std::_Function_handler<...>;mongo::(anonymous namespace)::runFunc;start_thread:333;clone:109
      

        1. 3_6_0_repro.png
          61 kB
          Alex Cameron
        2. 4_0_10_repro.png
          69 kB
          Alex Cameron
        3. 4_0_2_repro.png
          77 kB
          Alex Cameron
        4. dd.tgz
          5.74 MB
          Bruce Lucas
        5. no-alter.png
          120 kB
          Bruce Lucas
        6. repro-repl-lag2.sh
          2 kB
          Bruce Lucas
        7. stalling.png
          592 kB
          Alex Cameron
        8. stalls.png
          230 kB
          Bruce Lucas
        9. stall-stacks.txt
          1.44 MB
          Bruce Lucas

            Assignee:
            kelsey.schubert@mongodb.com Kelsey Schubert
            Reporter:
            bruce.lucas@mongodb.com Bruce Lucas (Inactive)
            Votes:
            1 Vote for this issue
            Watchers:
            25 Start watching this issue

              Created:
              Updated:
              Resolved: