[SERVER-26425] Top entries may remain if write conflict occurs during dropDatabase Created: 30/Sep/16  Updated: 31/Oct/16  Resolved: 25/Oct/16

Status: Closed
Project: Core Server
Component/s: Diagnostics
Affects Version/s: 3.3.15
Fix Version/s: 3.4.0-rc2

Type: Bug Priority: Major - P3
Reporter: Kyle Suarez Assignee: Kyle Suarez
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File repro.patch    
Issue Links:
Depends
Related
related to SERVER-26750 Don't log a Top entry in ~AutoGetColl... Backlog
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

Apply the attached repro.patch, then run

$ python buildscripts/resmoke.py --executor no_passthrough_with_mongod jstests/noPassthroughWithMongod/top_drop.js

Sprint: Integration 2016-10-10, Integration 2016-10-31
Participants:
Linked BF Score: 0

 Description   

When a database is dropped, dropDatabase() removes each collection from Top. If a write conflict exception occurs before or during this loop, entries may remain in Top even after dropDatabase() finishes successfully.



 Comments   
Comment by Githook User [ 25/Oct/16 ]

Author:

{u'username': u'ksuarz', u'name': u'Kyle Suarez', u'email': u'kyle.suarez@mongodb.com'}

Message: SERVER-26425 perform dropDatabase in WriteConflictException retry loop
Branch: master
https://github.com/mongodb/mongo/commit/1e6fe6df6941e97c73db086e6ec7ebb24bc7dec9

Comment by Kyle Suarez [ 05/Oct/16 ]

In my attempts to reproduce this, I enabled the WTWriteConflictException failpoint and introduced a sleep when Database::dropDatabase() clears entries from Top. Unfortunately, I've hit something unexpected – in debug builds, background jobs run every second instead of every minute. If we sleep for too long, we hit the cursor timeout limit; the job that does this timing out acquires a read lock on the collection. Once dropDatabase() relinquishes its lock, the read lock is acquired, and an entry for the collection is logged into Top when AutoGetCollectionForRead goes out of scope. Here is a stack trace:

 src/mongo/db/stats/top.cpp:96:0: mongo::Top::record(mongo::OperationContext*, mongo::StringData, mongo::LogicalOp, int, long long, bool, mongo::Command::ReadWriteType)
 src/mongo/db/db_raii.cpp:121:0: mongo::AutoGetCollectionForRead::~AutoGetCollectionForRead()
 src/mongo/db/catalog/cursor_manager.cpp:278:0: mongo::GlobalCursorIdCache::timeoutCursors(mongo::OperationContext*, int)
 src/mongo/db/catalog/cursor_manager.cpp:291:0: mongo::CursorManager::timeoutCursorsGlobal(mongo::OperationContext*, int)
 src/mongo/db/clientcursor.cpp:271:0: mongo::ClientCursorMonitor::run()
 src/mongo/util/background.cpp:151:0: mongo::BackgroundJob::jobBody()

I am still confident that adding a retry loop to KVStorageEngine::dropDatabase() is a needed improvement, but in addition, we should update top_drop.js to be more resilient against unexpected entries being added to Top because of other threads.

Generated at Thu Feb 08 04:12:06 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.