[SERVER-29011] Compact Calls to WiredTiger take multiple overlapping WT_SESSION objects Created: 28/Apr/17 Updated: 30/Oct/23 Resolved: 04/May/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Storage, WiredTiger |
| Affects Version/s: | 3.5.7 |
| Fix Version/s: | 3.4.6, 3.5.7 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | David Hows | Assignee: | David Hows |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | bkp | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Backport Requested: |
v3.4
|
||||||||
| Steps To Reproduce: | Instrument code to show all session take/return calls. |
||||||||
| Sprint: | Storage 2017-05-08, Storage 2017-05-29 | ||||||||
| Participants: | |||||||||
| Linked BF Score: | 0 | ||||||||
| Description |
|
We have seen build failures with a stuck cache when running the FSM suite and compact tasks are in flight. Diving into the issue, it appears that the compact operation runs over multiple WT_SESSION objects. A first session, with an "empty" transaction is opened when the command is in the early stages, then subsequent sessions are taken from the session cache to run compact on the record store and each index. This can cause problems in testing due to there being a single transaction running for the length of all the compact operations. There may also be scope here for a more full review of places in the WiredTiger KV Engine that we can go through and find locations that also exhibit this behaviour of opening sessions with transactions that are never used or taking sessions directly from the session cache. |
| Comments |
| Comment by Githook User [ 15/Jun/17 ] |
|
Author: {u'username': u'daveh86', u'name': u'David Hows', u'email': u'howsdav@gmail.com'}Message: (cherry picked from commit 584d4a6a25ce56b07f13247b3ce7fe298b4a111e) |
| Comment by Githook User [ 02/May/17 ] |
|
Author: {u'username': u'daveh86', u'name': u'David Hows', u'email': u'howsdav@gmail.com'}Message: |
| Comment by David Hows [ 28/Apr/17 ] |
|
After some discussion I have set the scope at looking at the slower WiredTiger session methods, compact and truncate. I had initially considered create and drop, but create operations (on a record store at least) are always within a WUOW. Drops face similar issues, as the opCtx is not currently plumbed down to the level where we perform all the drop operations. I had also considered looking at salvage, verify and checkpoint. These three had issues with access of opCtx objects as well. With salvage and verify having the potential to be used at the instanciation of the WT KV Engine and checkpoint being run by durability code. |
| Comment by David Hows [ 28/Apr/17 ] |
|
As noted, I found that we take extra sessions from the WiredTiger session cache to perform compact operations. I'm currently testing a change where we would take these sessions from the opCtx/recoveryUnit and then close the automatically opened txn (with abandonSnapshot). |