Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-17316

rc7 many threads "stuck" in pthread_cond_timedwait

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 3.0.0-rc8
    • Affects Version/s: 3.0.0-rc7
    • Component/s: Storage
    • Fully Compatible
    • ALL
    • Hide

      benchrun.py (mongo-perf) update workloads on Linux or on Windows

      in or near Update.MmsIncShallow1
      in or near Update.IncFewSmallDoc
      in or near Update.v3.IncWithIndex

      Show
      benchrun.py (mongo-perf) update workloads on Linux or on Windows in or near Update.MmsIncShallow1 in or near Update.IncFewSmallDoc in or near Update.v3.IncWithIndex
    • None
    • 3
    • None
    • None
    • None
    • None
    • None
    • None

      Just documenting this for our Project Manager, no symptoms seen in RC8 nor in RC9-pre.

      mongod running wiredTiger becomes unresponsive, perhaps just sufficiently slow so as to appear unresponsive. Multiple write operations appear "stuck" with many threads repeatedly waiting on pthread_cond_timedwait related to WT cache.

      attached to the Linux process with gdb and found it is spawning threads rapidly, with many threads looking for a condition variable, apparently a WT cache wait.

      (gdb) info threads
        15 Thread 0x7fdd10f36700 (LWP 6554)  0x00007fdd17d174b5 in sigwait ()
         from /lib64/libpthread.so.0
        14 Thread 0x7fdd10535700 (LWP 6555)  0x00007fdd17d1398e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
        13 Thread 0x7fdd0fb34700 (LWP 6556)  0x00007fdd17d1398e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
        12 Thread 0x7fdd0f133700 (LWP 6557)  0x00007fdd17d1398e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
        11 Thread 0x7fdd0e732700 (LWP 6558)  0x00007fdd17d1398e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
        10 Thread 0x7fdd0dd31700 (LWP 6559)  0x00007fdd17d1398e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
        9 Thread 0x7fdd0d330700 (LWP 6560)  0x00007fdd17d135bc in pthread_cond_wait@@GLIBC_2.3.2 ()
         from /lib64/libpthread.so.0
        8 Thread 0x7fdd0c92f700 (LWP 6561)  0x00007fdd17d1398e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
        7 Thread 0x7fdd0bf2e700 (LWP 6562)  0x00007fdd17d1398e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
        6 Thread 0x7fdd0b52d700 (LWP 6563)  0x00007fdd17d1398e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
        5 Thread 0x7fdd0ab2c700 (LWP 6564)  0x00007fdd17d1398e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
        4 Thread 0x7fdd0a02a700 (LWP 6592)  0x00007fdd17d1398e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
        3 Thread 0x7fdd09821700 (LWP 6693)  0x000000000131c582 in ?? ()
        2 Thread 0x7fdd08e20700 (LWP 6694)  0x000000000131c56b in ?? ()
      * 1 Thread 0x7fdd18138b60 (LWP 6553)  0x00007fdd16ea95d3 in select () from /lib64/libc.so.6
      
      
      
      (gdb) bt
      #0  0x00007fdd17d1398e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
      #1  0x000000000133468f in __wt_cond_wait ()
      #2  0x000000000131c80c in __wt_cache_wait ()
      #3  0x00000000012cd8ca in __wt_btcur_search ()
      #4  0x00000000013078f3 in ?? ()
      #5  0x0000000000d55b31 in mongo::WiredTigerRecordStore::findRecord(mongo::OperationContext*, mongo::RecordId const&, mongo::RecordData*) const ()
      #6  0x0000000000cc9df1 in mongo::KVCatalog::_findEntry(mongo::OperationContext*, mongo::StringData const&, mongo::RecordId*) const ()
      #7  0x0000000000cca050 in mongo::KVCatalog::getMetaData(mongo::OperationContext*, mongo::StringData const&) ()
      #8  0x0000000000cceba5 in mongo::KVCollectionCatalogEntry::_getMetaData(mongo::OperationContext*) const ()
      #9  0x0000000000cae9c6 in mongo::BSONCollectionCatalogEntry::getTotalIndexCount(mongo::OperationContext*) const ()
      #10 0x00000000009bc0f1 in mongo::CmdDrop::run(mongo::OperationContext*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::BSONObj&, int, std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, mongo::BSONObjBuilder&, bool) ()
      #11 0x00000000009b71a4 in mongo::_execCommand(mongo::OperationContext*, mongo::Command*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::BSONObj&, int, std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, mongo::BSONObjBuilder&, bool) ()
      #12 0x00000000009b80e3 in mongo::Command::execCommand(mongo::OperationContext*, mongo::Command*, int, char const*, mongo::BSONObj&, mongo::BSONObjBuilder&, bool) ()
      #13 0x00000000009b8cdb in mongo::_runCommands(mongo::OperationContext*, char const*, mongo::BSONObj&, mongo::_BufBuilder<mongo::TrivialAllocator>&, mongo::BSONObjBuilder&, bool, int) ()
      #14 0x0000000000b87f95 in mongo::runQuery(mongo::OperationContext*, mongo::Message&, mongo::QueryMessage&, mongo::NamespaceString const&, mongo::CurOp&, mongo::Message&, bool) ()
      #15 0x0000000000a99f88 in mongo::assembleResponse(mongo::OperationContext*, mongo::Message&, mongo::DbResponse&, mongo::HostAndPort const&, bool) ()
      #16 0x00000000007e6730 in mongo::MyMessageHandler::process(mongo::Message&, mongo::AbstractMessagingPort*, mongo::LastError*) ()
      #17 0x0000000000ef8aab in mongo::PortMessageServer::handleIncomingMsg(void*) ()
      #18 0x00007fdd17d0f9d1 in start_thread () from /lib64/libpthread.so.0
      ---Type <return> to continue, or q <return> to quit---
      #19 0x00007fdd16eb0b5d in clone () from /lib64/libc.so.6
      

            Assignee:
            Unassigned Unassigned
            Reporter:
            quentin.conner Quentin Conner (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: