Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-38735

Extended stalls under cache pressure

    XMLWordPrintable

    Details

    • Backport Requested:
      v4.2, v4.0
    • Sprint:
      Storage NYC 2018-10-08, Storage NYC 2018-10-22, Storage NYC 2018-11-05, Storage Engines 2019-06-17
    • Case:
    • Story Points:
      5

      Description

      Attached script (taken from another ticket) creates cache pressure on the primary by lagging the secondary. This isn't an ideal condition for the storage engine, but it appears that 3.6 did significantly better on this workload than 4.0:

      • under 3.6 the same workload completed significantly faster
      • under 4.0 there are extended total stalls of up to 80s, whereas worst under 3.6 in these tests was about 8s.
      • it appears that 4.0.2 may be a little worse than 4.0.0, although the erratic performance makes the data noisy so some much longer runs would probably be needed to see if there is a real difference

      FTDC data and 9 stack samples taken during one of the stalls attached. Here are a couple of the top stacks with counts; in each of the 9 samples one of the application threads doing i/o while evicting data and other threads are waiting.

       
      109 pthread_cond_timedwait@@GLIBC_2.3.2:238;__wt_cond_wait_signal;__wt_cache_eviction_worker;__wt_txn_commit;__session_commit_transaction;mongo::WiredTigerRecoveryUnit::_txnClose;mongo::WiredTigerRecoveryUnit::_commit;mongo::WriteUnitOfWork::commit;mongo::(anonymous namespace)::insertDocuments;mongo::performInserts;mongo::(anonymous namespace)::CmdInsert::Invocation::runImpl;mongo::(anonymous namespace)::WriteCommand::InvocationBase::run;mongo::(anonymous namespace)::invokeInTransaction;mongo::(anonymous namespace)::execCommandDatabase;mongo::(anonymous namespace)::receivedCommands;mongo::ServiceEntryPointCommon::handleRequest;mongo::ServiceEntryPointMongod::handleRequest;mongo::ServiceStateMachine::_processMessage;mongo::ServiceStateMachine::_runNextInGuard;std::_Function_handler<...>;mongo::transport::ServiceExecutorSynchronous::schedule;mongo::ServiceStateMachine::_scheduleNextWithGuard;mongo::ServiceStateMachine::_sourceCallback;mongo::ServiceStateMachine::_sourceMessage;mongo::ServiceStateMachine::_runNextInGuard;std::_Function_handler<...>;std::_Function_handler<...>;mongo::(anonymous namespace)::runFunc;start_thread:333;clone:109
        15 __lll_lock_wait:135;__GI___pthread_mutex_lock:80;__split_internal_lock;__wt_split_multi;__wt_evict;__evict_page;__wt_cache_eviction_worker;__wt_txn_commit;__session_commit_transaction;mongo::WiredTigerRecoveryUnit::_txnClose;mongo::WiredTigerRecoveryUnit::_commit;mongo::WriteUnitOfWork::commit;mongo::(anonymous namespace)::insertDocuments;mongo::performInserts;mongo::(anonymous namespace)::CmdInsert::Invocation::runImpl;mongo::(anonymous namespace)::WriteCommand::InvocationBase::run;mongo::(anonymous namespace)::invokeInTransaction;mongo::(anonymous namespace)::execCommandDatabase;mongo::(anonymous namespace)::receivedCommands;mongo::ServiceEntryPointCommon::handleRequest;mongo::ServiceEntryPointMongod::handleRequest;mongo::ServiceStateMachine::_processMessage;mongo::ServiceStateMachine::_runNextInGuard;std::_Function_handler<...>;mongo::transport::ServiceExecutorSynchronous::schedule;mongo::ServiceStateMachine::_scheduleNextWithGuard;mongo::ServiceStateMachine::_sourceCallback;mongo::ServiceStateMachine::_sourceMessage;mongo::ServiceStateMachine::_runNextInGuard;std::_Function_handler<...>;std::_Function_handler<...>;mongo::(anonymous namespace)::runFunc;start_thread:333;clone:109
         9 pwrite64:81;__posix_file_write;__block_write_off;__wt_block_write;__wt_bt_write;__rec_split_write;__wt_reconcile;__wt_evict;__evict_page;__wt_cache_eviction_worker;__wt_txn_commit;__session_commit_transaction;mongo::WiredTigerRecoveryUnit::_txnClose;mongo::WiredTigerRecoveryUnit::_commit;mongo::WriteUnitOfWork::commit;mongo::(anonymous namespace)::insertDocuments;mongo::performInserts;mongo::(anonymous namespace)::CmdInsert::Invocation::runImpl;mongo::(anonymous namespace)::WriteCommand::InvocationBase::run;mongo::(anonymous namespace)::invokeInTransaction;mongo::(anonymous namespace)::execCommandDatabase;mongo::(anonymous namespace)::receivedCommands;mongo::ServiceEntryPointCommon::handleRequest;mongo::ServiceEntryPointMongod::handleRequest;mongo::ServiceStateMachine::_processMessage;mongo::ServiceStateMachine::_runNextInGuard;std::_Function_handler<...>;mongo::transport::ServiceExecutorSynchronous::schedule;mongo::ServiceStateMachine::_scheduleNextWithGuard;mongo::ServiceStateMachine::_sourceCallback;mongo::ServiceStateMachine::_sourceMessage;mongo::ServiceStateMachine::_runNextInGuard;std::_Function_handler<...>;std::_Function_handler<...>;mongo::(anonymous namespace)::runFunc;start_thread:333;clone:109
      

        Attachments

        1. 3_6_0_repro.png
          3_6_0_repro.png
          61 kB
        2. 4_0_10_repro.png
          4_0_10_repro.png
          69 kB
        3. 4_0_2_repro.png
          4_0_2_repro.png
          77 kB
        4. dd.tgz
          5.74 MB
        5. no-alter.png
          no-alter.png
          120 kB
        6. repro-repl-lag2.sh
          2 kB
        7. stalling.png
          stalling.png
          592 kB
        8. stalls.png
          stalls.png
          230 kB
        9. stall-stacks.txt
          1.44 MB

          Activity

            People

            Assignee:
            bruce.lucas Bruce Lucas
            Reporter:
            bruce.lucas Bruce Lucas
            Participants:
            Votes:
            1 Vote for this issue
            Watchers:
            27 Start watching this issue

              Dates

              Created:
              Updated:

                  PagerDuty