[SERVER-26870] Sometimes collection data file is not removed even though collection is dropped Created: 02/Nov/16 Updated: 08/Feb/23 Resolved: 07/Dec/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | WiredTiger |
| Affects Version/s: | 3.2.10 |
| Fix Version/s: | 3.4.1, 3.5.1 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | 아나 하리 | Assignee: | Geert Bosch |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||
| Backport Completed: | |||||||||||||||||||||||||||||
| Backport Requested: |
v3.2
|
||||||||||||||||||||||||||||
| Steps To Reproduce: |
|
||||||||||||||||||||||||||||
| Sprint: | Storage 2016-12-12 | ||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||||||
| Linked BF Score: | 0 | ||||||||||||||||||||||||||||
| Description |
|
Once mongod is restarted with Ctrl+c, |
| Comments |
| Comment by Githook User [ 08/Dec/16 ] |
|
Author: {u'username': u'GeertBosch', u'name': u'Geert Bosch', u'email': u'geert@mongodb.com'}Message: (cherry picked from commit 02fa55abc653d1356ade3f6365d9d02de7f6113f) |
| Comment by Githook User [ 07/Dec/16 ] |
|
Author: {u'username': u'GeertBosch', u'name': u'Geert Bosch', u'email': u'geert@mongodb.com'}Message: |
| Comment by Geert Bosch [ 07/Dec/16 ] |
|
The problem is in the way we're caching WiredTigerCursors and WiredTigerSession objects. We cache cursors in each session, and also cache sessions globally. Most operations are short-lived in MongoDB, so that tends not to be a big problem: when we get a WT_BUSY, we go through the caches and retire old cursors. Also, we always reset the cursors, so they don't hold on to much. However, we have some internal clients that have very long-running operations: they could run forever. That way, they may have cached cursors that they never give back to the cache, and that will not be freed. In this ticket the NoopWriter thread (that periodically writes to the oplog for tracking delays) was particularly unlucky and inherited the session used before by the initandlisten thread. This session was used for checking last written records in all collections, and thus has cursors on all of these. Because the NoopWriter holds on to its OperationContext forever, its WiredTigerSession is never returned to the WiredTigerSessionCache and thus its cursors are never retired. This specific case can be fixed by creating a new OperationContext for each individual write. In general we don't always know when a thread is going to sleep for a long time and should return its cached resources or whether it should hold on to them. There are also problems in pushing the caching to the WiredTiger layer itself, because WiredTiger allows many different configurations and it would be hard to know how when its OK to reuse a cursor. A good general solution would make all cursors available for reclaiming when they're cached. When about to drop an ident with a given id, we should reclaim all cursors for that id (and that id alone) before attempting the operation. When still encountering WT_BUSY, we should queue the drop as we do now. |
| Comment by Ramon Fernandez Marina [ 01/Dec/16 ] |
|
Thanks for reporting this issue matt.lee, we're able to reproduce it and we're investigating. |
| Comment by 아나 하리 [ 02/Nov/16 ] |
|
Hi Thomas. Would you please clarify the cluster's configuration? Would you clarify how frequently this issue occurs in your reproduction attempts? Regards, |
| Comment by Kelsey Schubert [ 02/Nov/16 ] |
|
Hi matt.lee, Thanks for the report with the reproduction steps. I have few questions to get a better understanding of what is happening here. It is possible that you may be encountering
Kind regards, |