At the moment it is possible for queries on the metadata table to block waiting for space in cache. That can lead to hangs, since some paths that query the metadata hold locks in the system.
I've seen a case of this with LSM trees where a thread is doing a checkpoint, and has the checkpoint lock, but is waiting for the table lock. While another session is opening a table for the first time, which holds the table lock, and then is waiting on space in cache. Space never becomes available and the system hangs.
Call stack of the cursor open operation:
Thread 109 (Thread 0x2b99764a0940 (LWP 27326)): #0 0x00002b996bf24280 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x0000000001a584cb in __wt_cond_wait_signal () #2 0x0000000001a2eb2c in __wt_cache_eviction_worker () #3 0x00000000019c10c8 in __cursor_func_init.constprop.15 () #4 0x00000000019c1818 in __wt_btcur_search () #5 0x0000000001a1329f in __curfile_search () #6 0x0000000001a77a41 in __wt_schema_open_table () #7 0x0000000001a75b08 in __wt_schema_get_table () #8 0x0000000001a2c4b1 in __wt_curtable_open () #9 0x0000000001a8a041 in __session_open_cursor_int () #10 0x0000000001a8a5f9 in __session_open_cursor () #11 0x000000000108a934 in mongo::WiredTigerSession::getCursor(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned long, bool) () #12 0x0000000001089280 in mongo::WiredTigerCursor::WiredTigerCursor(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned long, bool, mongo::OperationContext*) () #13 0x000000000107c95a in mongo::WiredTigerRecordStore::findRecord(mongo::OperationContext*, mongo::RecordId const&, mongo::RecordData*) const () #14 0x0000000000fcd126 in mongo::KVCatalog::_findEntry(mongo::OperationContext*, mongo::StringData, mongo::RecordId*) const () #15 0x0000000000fcd367 in mongo::KVCatalog::getMetaData(mongo::OperationContext*, mongo::StringData) () #16 0x0000000000fd2330 in mongo::KVCollectionCatalogEntry::_getMetaData(mongo::OperationContext*) const () #17 0x0000000000fb340a in mongo::BSONCollectionCatalogEntry::getAllIndexes(mongo::OperationContext*, std::vector<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > > >*) const () #18 0x00000000010976c4 in mongo::TTLMonitor::getTTLIndexesForDB(mongo::OperationContext*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<mongo::BSONObj, std::allocator<mongo::BSONObj> >*) () #19 0x00000000010990d1 in mongo::TTLMonitor::doTTLPass() () #20 0x0000000001099798 in mongo::TTLMonitor::run() () #21 0x00000000012a4ba0 in mongo::BackgroundJob::jobBody() () #22 0x0000000001b4e5f0 in execute_native_thread_routine () #23 0x00002b996bf1f83d in start_thread () from /lib64/libpthread.so.0 #24 0x00002b996c209fdd in clone () from /lib64/libc.so.6
Call stack of the checkpoint:
Thread 2 (Thread 0x2b99a2b7c940 (LWP 29574)): #0 0x00002b996bf26654 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00002b996bf21f4a in _L_lock_1034 () from /lib64/libpthread.so.0 #2 0x00002b996bf21e0c in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x0000000001a9cb71 in __wt_spin_lock_track () #4 0x0000000001a9dbce in __txn_checkpoint () #5 0x0000000001a9e7cf in __wt_txn_checkpoint () #6 0x0000000001a898ea in __session_checkpoint () #7 0x000000000108b746 in mongo::WiredTigerSessionCache::waitUntilDurable(bool) () #8 0x0000000001074c32 in mongo::WiredTigerKVEngine::flushAllFiles(bool) () #9 0x0000000000b0579c in mongo::FSyncLockThread::doRealWork() () #10 0x0000000000b07444 in mongo::FSyncLockThread::run() () #11 0x00000000012a4ba0 in mongo::BackgroundJob::jobBody() () #12 0x0000000001b4e5f0 in execute_native_thread_routine () #13 0x00002b996bf1f83d in start_thread () from /lib64/libpthread.so.0 #14 0x00002b996c209fdd in clone () from /lib64/libc.so.6
I think we should either add a check in __wt_cache_eviction_check for the btree being the metadata table, or always set the WT_SESSION_NO_EVICTION flag on sessions when they are using the metadata. The latter is likely to require more invasive code changes.