[SERVER-25012] createIndex blocks for duration of checkpoint while holding locks Created: 12/Jul/16 Updated: 06/Jul/17 Resolved: 20/Sep/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | WiredTiger |
| Affects Version/s: | 3.2.5, 3.2.7 |
| Fix Version/s: | 3.2.12, 3.3.14 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Bruce Lucas (Inactive) | Assignee: | Sulabh Mahajan |
| Resolution: | Done | Votes: | 1 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Backport Completed: | |||||||||||||||||
| Backport Requested: |
v3.2
|
||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
Test does some inserts to generate checkpoints of about 30 seconds, while doing createIndex about once per second:
The createIndexes stall during checkpoints, which in turn stalls other operations because createIndex holds some locks. Stack traces collected during the stall show createIndex stuck waiting on a mutex in __wt_curfile_open:
Here's the repro code:
|
| Comments |
| Comment by Githook User [ 22/Dec/16 ] |
|
Author: {u'username': u'sulabhM', u'name': u'Sulabh Mahajan', u'email': u'sulabh.mahajan@mongodb.com'}Message: (cherry picked from commit d8ac57088d8eae13cf4bed7d9232bddea27e8052) |
| Comment by Githook User [ 20/Sep/16 ] |
|
Author: {u'username': u'sulabhM', u'name': u'Sulabh Mahajan', u'email': u'sulabh.mahajan@mongodb.com'}Message: |
| Comment by Alexander Gorrod [ 19/Jul/16 ] |
|
michael.cahill I've opened |
| Comment by Michael Cahill (Inactive) [ 18/Jul/16 ] |
|
bruce.lucas, this was a conscious choice at some point in the WiredTiger layer between blocking and returning an EBUSY error. Bulk loads require exclusive access to the tree and enforce this with an exclusive handle lock. Unfortunately, checkpoints acquire shared locks on all handles, leading to the conflict. It happens that MongoDB can deal with an EBUSY in this situation by opening a non-bulk cursor, but it is potentially surprising that a background checkpoint can cause bulk loads to fail. One solution would be a "bulk=try" flag that first tries to open a bulk cursor, and if that would fail with EBUSY, falls back to an ordinary cursor. |