[SERVER-22554] WiredTiger data handles not closed when collection is dropped Created: 10/Feb/16  Updated: 11/Mar/16  Resolved: 23/Feb/16

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.0.8, 3.0.9
Fix Version/s: 3.0.10

Type: Bug Priority: Major - P3
Reporter: Bruce Lucas (Inactive) Assignee: Alexander Gorrod
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File dhandles.png     PNG File dhandles_321.png    
Issue Links:
Related
Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Description   

Test case creates collections and indexes and then drops them at a high rate through point A below. WT data handles accumulate and are never closed.

Problem reproduces in several 3.0 versions, but not in 3.2.1.

Repro script:

function create(t) {
    for (var i=0; i<100; i++) {
        c = db['c.'+t+'.'+i]
        c.insert({})
        c.createIndex({x:1})
        c.createIndex({y:1})
        c.createIndex({z:1})
        c.drop()
        if (i%100==0)
            print(i)
    }
}



 Comments   
Comment by Githook User [ 23/Feb/16 ]

Author:

{u'name': u'Ramon Fernandez', u'email': u'ramon@mongodb.com'}

Message: Import wiredtiger-wiredtiger-mongodb-3.0.9-2-g62b3ca8.tar.gz from wiredtiger branch mongodb-3.0

ref: cae5fcf..62b3ca8

SERVER-22554 WiredTiger data handles not closed when collection is dropped
Branch: v3.0
https://github.com/mongodb/mongo/commit/79b1c655f28b44f9d055ced9eddb22fa49b4d251

Comment by Githook User [ 22/Feb/16 ]

Author:

{u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

Message: Merge pull request #2514 from wiredtiger/server-22554-backport

SERVER-22554 Fix a reference counting bug in dhandles.
Branch: mongodb-3.0
https://github.com/wiredtiger/wiredtiger/commit/62b3ca8a7a07287205fca35bc49c24db121c0855

Comment by Githook User [ 22/Feb/16 ]

Author:

{u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexg@wiredtiger.com'}

Message: SERVER-22554 Fix a reference counting bug in dhandles.

The code has diverged a lot between 3.0 and the latest develop, and
a backport included half of a reference counting change. This
patches up the reference counting in 3.0. The problem being solved
is:
There are two phases to opening a handle in a session for the first
time. First retrieve or open the handle from the connection cache,
then add it to the session cache. The code was structured to split
those into two separate phases (one while holding a lock the other
after the lock had been released). The reference count needs to be
bumped while the lock is being held, the session cache doesn't need
to be updated while holding the lock.

This change does both while holding the lock to keep reference count
tracking sane and correct.
Branch: mongodb-3.0
https://github.com/wiredtiger/wiredtiger/commit/39adc781ce9de9f4e562015cc9cccc70eb5e9e3e

Comment by Alexander Gorrod [ 12/Feb/16 ]

It appears as though this is related to an incomplete backport of either WT-2038 or WT-1598.

What I'm seeing is that the forced drops are being issued to WiredTiger, and WiredTiger is queing the handles to be closed by the sweep server. The sweep server is repeatedly attempting the drops but the session_ref field is non-zero, so it never finalizes the drop. When I look through all of the sessions handle caches in a debugger none of them are referencing the handles.

Part of this is that there is a change to the reference counting of dhandles here:
https://jira.mongodb.org/browse/WT-2038?focusedCommentId=1005420&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-1005420

That should have removed a bump to session_ref that is present in __session_add_dhandle. The code change for that is:

--- a/src/third_party/wiredtiger/src/session/session_dhandle.c
+++ b/src/third_party/wiredtiger/src/session/session_dhandle.c
@@ -31,8 +31,6 @@ __session_add_dhandle(
        if (dhandle_cachep != NULL)
                *dhandle_cachep = dhandle_cache;
 
-       (void)__wt_atomic_add32(&session->dhandle->session_ref, 1);
-
        /* Sweep the handle list to remove any dead handles. */
        return (__session_dhandle_sweep(session));
 }

Making just that change and testing locally wasn't enough to resolve the issue, so I'll need to spend more time figuring out the missing piece.

Generated at Thu Feb 08 04:00:44 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.