|
In my attempts to reproduce this, I enabled the WTWriteConflictException failpoint and introduced a sleep when Database::dropDatabase() clears entries from Top. Unfortunately, I've hit something unexpected – in debug builds, background jobs run every second instead of every minute. If we sleep for too long, we hit the cursor timeout limit; the job that does this timing out acquires a read lock on the collection. Once dropDatabase() relinquishes its lock, the read lock is acquired, and an entry for the collection is logged into Top when AutoGetCollectionForRead goes out of scope. Here is a stack trace:
src/mongo/db/stats/top.cpp:96:0: mongo::Top::record(mongo::OperationContext*, mongo::StringData, mongo::LogicalOp, int, long long, bool, mongo::Command::ReadWriteType)
|
src/mongo/db/db_raii.cpp:121:0: mongo::AutoGetCollectionForRead::~AutoGetCollectionForRead()
|
src/mongo/db/catalog/cursor_manager.cpp:278:0: mongo::GlobalCursorIdCache::timeoutCursors(mongo::OperationContext*, int)
|
src/mongo/db/catalog/cursor_manager.cpp:291:0: mongo::CursorManager::timeoutCursorsGlobal(mongo::OperationContext*, int)
|
src/mongo/db/clientcursor.cpp:271:0: mongo::ClientCursorMonitor::run()
|
src/mongo/util/background.cpp:151:0: mongo::BackgroundJob::jobBody()
|
I am still confident that adding a retry loop to KVStorageEngine::dropDatabase() is a needed improvement, but in addition, we should update top_drop.js to be more resilient against unexpected entries being added to Top because of other threads.
|