[SERVER-18014] Dropping a collection can block creating a new collection for an extended time under WiredTiger Created: 13/Apr/15  Updated: 15/Dec/15  Resolved: 27/Apr/15

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.0.1
Fix Version/s: 3.0.3, 3.1.3

Type: Bug Priority: Critical - P2
Reporter: Bruce Lucas (Inactive) Assignee: Michael Cahill (Inactive)
Resolution: Done Votes: 0
Labels: ET
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File create.png     PNG File freeing.png     PNG File linux-02.png     PNG File linux-03.png     PNG File patch-56-sync-drop.png     HTML File procdump.html    
Issue Links:
Depends
Related
is related to SERVER-17907 B-tree eviction blocks access to coll... Closed
Backwards Compatibility: Fully Compatible
Backport Completed:
Participants:

 Description   
Issue Status as of Apr 29, 2015

ISSUE SUMMARY
Because of an interaction between the process for dropping and creating collections with the WiredTiger storage engine, the process of dropping a collection and freeing resources used by that collection can block creating new collections.

USER IMPACT
Workloads that involve large volumes of collection drop operations may experience performance degredation.

WORKAROUNDS
None.

AFFECTED VERSIONS
3.0.0, 3.0.1, 3.0.2

FIX VERSION
The fix is included in the 3.0.3 production release.

Original description

Dropping a collection can take a long time under WiredTiger because can require freeing a large number of allocated buffers. See SERVER-17907 for information on reproducing.

While this is occuring creating a new collection may be blocked for a substantial time. The create was accomplished in this test by inserting a record into a non-existent collection, so the stack traces below show the createCollection happening within insertOne. It is blocked in two different places:

  • From A to B it's blocked waiting for the db lock. This occurs synchronously with the drop command itself, which is busy freeing buffers.
  • From B to C it's blocked waiting to update the metadata table. This occurs while synchronously with dropAllQueued (not shown in the screenshot below), which is also busy freeing buffers.
    Total timeline below is about 3.5 minutes, so total time blocked is about 2 minutes.



 Comments   
Comment by Githook User [ 18/May/15 ]

Author:

{u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

Message: Don't try to checkpoint dead handles.

If a handle is busy when a checkpoint starts (e.g., in the middle of a bulk load), then dead by the time the checkpoint visits it (e.g., a forced drop happens after the checkpoint starts).

refs SERVER-18014
Branch: develop
https://github.com/wiredtiger/wiredtiger/commit/c207b64a1f7be98ea65a3ae95407c4da1353498a

Comment by Githook User [ 18/May/15 ]

Author:

{u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

Message: Add forced drops to test/fops. We have seen failures where a checkpoint interleaves between a bulk load and a forced drop to try to operate on a dead handle.

refs SERVER-18014
Branch: develop
https://github.com/wiredtiger/wiredtiger/commit/80cc29d3531500a529cf044f9419195fe54f9a47

Comment by Githook User [ 27/Apr/15 ]

Author:

{u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexg@wiredtiger.com'}

Message: If getting a handle lock only - don't propogate WT_NOTFOUND.

It's expected after the background drop changes.
Refs SERVER-18014
Branch: develop
https://github.com/wiredtiger/wiredtiger/commit/5991f88fefb1a5989f9b3633b7cd5c0dc1d57854

Comment by Githook User [ 27/Apr/15 ]

Author:

{u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

Message: Close block manager handles as soon as a handle is marked dead.

refs SERVER-18014
Branch: develop
https://github.com/wiredtiger/wiredtiger/commit/f1d1d01d0759263b8f18719f63f08696a15e9f91

Comment by Githook User [ 27/Apr/15 ]

Author:

{u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

Message: Discard trees from cache in the background.

We used to keep handles locked while freeing their pages from cache (either for drops or when sweeping old handles). If an application thread attempted to open a cursor during one of these operations, it was forced to wait until the discard completed.

With this change, handles are marked "dead", and readers will no longer use them. The sweep server will later discard dead trees from cache in the background, without holding any locks that application threads should block on.

refs SERVER-17907, SERVER-18014
Branch: develop
https://github.com/wiredtiger/wiredtiger/commit/440cbc76902432eb233b8a8bda1df1265bdd6e46

Comment by Bruce Lucas (Inactive) [ 23/Apr/15 ]

Thanks michael.cahill, that build does the trick. The drop command doesn't show up at all in the stack trace sample (because it happened too quickly), a short time later the sweep server is seen spending a bunch of time in __wt_conn_btree_sync_and_close (most of that in free, because this was Windows...), and things like listDatabases and createCollection are not blocked. Ship it!

Comment by Githook User [ 16/Apr/15 ]

Author:

{u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

Message: Discard trees from cache in the background.

We used to keep handles locked while freeing their pages from cache (either for drops or when sweeping old handles). If an application thread attempted to open a cursor during one of these operations, it was forced to wait until the discard completed.

With this change, handles are marked "dead", and readers will no longer use them. The sweep server will later discard dead trees from cache in the background, without holding any locks that application threads should block on.

refs SERVER-17907, SERVER-18014
Branch: tree-discard-background
https://github.com/wiredtiger/wiredtiger/commit/440cbc76902432eb233b8a8bda1df1265bdd6e46

Generated at Thu Feb 08 03:46:17 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.