[SERVER-17316] rc7 many threads "stuck" in pthread_cond_timedwait Created: 18/Feb/15  Updated: 02/Mar/15  Resolved: 19/Feb/15

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: 3.0.0-rc7
Fix Version/s: 3.0.0-rc8

Type: Bug Priority: Major - P3
Reporter: Quentin Conner Assignee: Unassigned
Resolution: Done Votes: 0
Labels: 28qa
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

benchrun.py (mongo-perf) update workloads on Linux or on Windows

in or near Update.MmsIncShallow1
in or near Update.IncFewSmallDoc
in or near Update.v3.IncWithIndex

Participants:

 Description   

Just documenting this for our Project Manager, no symptoms seen in RC8 nor in RC9-pre.

mongod running wiredTiger becomes unresponsive, perhaps just sufficiently slow so as to appear unresponsive. Multiple write operations appear "stuck" with many threads repeatedly waiting on pthread_cond_timedwait related to WT cache.

attached to the Linux process with gdb and found it is spawning threads rapidly, with many threads looking for a condition variable, apparently a WT cache wait.

(gdb) info threads
  15 Thread 0x7fdd10f36700 (LWP 6554)  0x00007fdd17d174b5 in sigwait ()
   from /lib64/libpthread.so.0
  14 Thread 0x7fdd10535700 (LWP 6555)  0x00007fdd17d1398e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  13 Thread 0x7fdd0fb34700 (LWP 6556)  0x00007fdd17d1398e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  12 Thread 0x7fdd0f133700 (LWP 6557)  0x00007fdd17d1398e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  11 Thread 0x7fdd0e732700 (LWP 6558)  0x00007fdd17d1398e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  10 Thread 0x7fdd0dd31700 (LWP 6559)  0x00007fdd17d1398e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  9 Thread 0x7fdd0d330700 (LWP 6560)  0x00007fdd17d135bc in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
  8 Thread 0x7fdd0c92f700 (LWP 6561)  0x00007fdd17d1398e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  7 Thread 0x7fdd0bf2e700 (LWP 6562)  0x00007fdd17d1398e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  6 Thread 0x7fdd0b52d700 (LWP 6563)  0x00007fdd17d1398e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  5 Thread 0x7fdd0ab2c700 (LWP 6564)  0x00007fdd17d1398e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  4 Thread 0x7fdd0a02a700 (LWP 6592)  0x00007fdd17d1398e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  3 Thread 0x7fdd09821700 (LWP 6693)  0x000000000131c582 in ?? ()
  2 Thread 0x7fdd08e20700 (LWP 6694)  0x000000000131c56b in ?? ()
* 1 Thread 0x7fdd18138b60 (LWP 6553)  0x00007fdd16ea95d3 in select () from /lib64/libc.so.6
 
 
 
(gdb) bt
#0  0x00007fdd17d1398e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x000000000133468f in __wt_cond_wait ()
#2  0x000000000131c80c in __wt_cache_wait ()
#3  0x00000000012cd8ca in __wt_btcur_search ()
#4  0x00000000013078f3 in ?? ()
#5  0x0000000000d55b31 in mongo::WiredTigerRecordStore::findRecord(mongo::OperationContext*, mongo::RecordId const&, mongo::RecordData*) const ()
#6  0x0000000000cc9df1 in mongo::KVCatalog::_findEntry(mongo::OperationContext*, mongo::StringData const&, mongo::RecordId*) const ()
#7  0x0000000000cca050 in mongo::KVCatalog::getMetaData(mongo::OperationContext*, mongo::StringData const&) ()
#8  0x0000000000cceba5 in mongo::KVCollectionCatalogEntry::_getMetaData(mongo::OperationContext*) const ()
#9  0x0000000000cae9c6 in mongo::BSONCollectionCatalogEntry::getTotalIndexCount(mongo::OperationContext*) const ()
#10 0x00000000009bc0f1 in mongo::CmdDrop::run(mongo::OperationContext*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::BSONObj&, int, std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, mongo::BSONObjBuilder&, bool) ()
#11 0x00000000009b71a4 in mongo::_execCommand(mongo::OperationContext*, mongo::Command*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::BSONObj&, int, std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, mongo::BSONObjBuilder&, bool) ()
#12 0x00000000009b80e3 in mongo::Command::execCommand(mongo::OperationContext*, mongo::Command*, int, char const*, mongo::BSONObj&, mongo::BSONObjBuilder&, bool) ()
#13 0x00000000009b8cdb in mongo::_runCommands(mongo::OperationContext*, char const*, mongo::BSONObj&, mongo::_BufBuilder<mongo::TrivialAllocator>&, mongo::BSONObjBuilder&, bool, int) ()
#14 0x0000000000b87f95 in mongo::runQuery(mongo::OperationContext*, mongo::Message&, mongo::QueryMessage&, mongo::NamespaceString const&, mongo::CurOp&, mongo::Message&, bool) ()
#15 0x0000000000a99f88 in mongo::assembleResponse(mongo::OperationContext*, mongo::Message&, mongo::DbResponse&, mongo::HostAndPort const&, bool) ()
#16 0x00000000007e6730 in mongo::MyMessageHandler::process(mongo::Message&, mongo::AbstractMessagingPort*, mongo::LastError*) ()
#17 0x0000000000ef8aab in mongo::PortMessageServer::handleIncomingMsg(void*) ()
#18 0x00007fdd17d0f9d1 in start_thread () from /lib64/libpthread.so.0
---Type <return> to continue, or q <return> to quit---
#19 0x00007fdd16eb0b5d in clone () from /lib64/libc.so.6



 Comments   
Comment by Quentin Conner [ 19/Feb/15 ]

resolved in rc8

Generated at Thu Feb 08 03:44:00 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.