[SERVER-22209] Collection creation during final phase of checkpoint holds database lock for extended time Created: 16/Jan/16 Updated: 07/Dec/16 Resolved: 04/Apr/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | WiredTiger |
| Affects Version/s: | None |
| Fix Version/s: | 3.2.5, 3.3.4 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Bruce Lucas (Inactive) | Assignee: | Michael Cahill (Inactive) |
| Resolution: | Done | Votes: | 2 |
| Labels: | WTplaybook, code-and-test | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||
| Backport Completed: | |||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Description |
|
Here's an example:
Stack traces collected during the stall show that the blocked operation is a create collection (this was not evident from the stats and required stack traces because this was an implicit create and we don't have a counter for that), and that it is blocked in __wt_session_create:
Stack traces also confirm multiple threads queued up behind db lock:
|
| Comments |
| Comment by Andrew de Quincey [ 23/Apr/16 ] |
|
You're right. Thanks for looking into it though! |
| Comment by Ramon Fernandez Marina [ 22/Apr/16 ] |
|
adq@tvsquared.com, a backport to the 3.0 series is not feasible, but that should not affect you: I'd recommend you upgrade to 3.0+MMPAv1, and move to 3.2+WiredTiger from there – that way you avoid the configuration affected by this issue (3.0 + WiredTiger). |
| Comment by Ramon Fernandez Marina [ 26/Jan/16 ] |
|
adq, we need to wait and see how the fix looks for the development branch before we can decide if it's safe to backport to older branches. The decision will be reflected in changes to the "fixVersion" field. Until then I'm requesting a backport to v3.0 on your behalf. |
| Comment by Andrew de Quincey [ 26/Jan/16 ] |
|
Will this fix be backported to the 3.0 series as well? We're still on 2.6, so we need to upgrade to 3.0, and then on to 3.2. If this fix isn't backported, there's a window when we'll be running a pure 3.0 system on live at risk of these long lock times. |
| Comment by Alan Jackson [ 22/Jan/16 ] |
|
Hi Ramon, thanks for getting back so quickly. |
| Comment by Ramon Fernandez Marina [ 22/Jan/16 ] |
|
ajax@tvsquared.com, this issue only affects the WiredTiger storage engine. If you're considering upgrading note you can still upgrade to MongoDB 3.2 and continue to use the MMAPv1 storage engine. This will allow you to use all the new features in 3.2 (document validation, partial indexes, aggregation improvements) and set the stage for a transition to the WiredTiger storage engine once this issue is resolved. |
| Comment by Alan Jackson [ 22/Jan/16 ] |
|
This issue (and possibly related index creation locks that behave identically) is currently keeping us on mongodb 2.6. |
| Comment by Alexander Gorrod [ 21/Jan/16 ] |
|
michael@aorato.com We are actively pursuing a fix for this problem. We have isolated and understand the cause for the problem. We have created a reproducing case inside WiredTiger (see |
| Comment by Michael Dolinsky [ 21/Jan/16 ] |
|
This is a critical fix for us as we are creating and dropping collections on an hourly basis and due to the stall we could lose data that would otherwise needs to be inserted. Thank you. |