[SERVER-64026] Update WT operations that require exclusive access to a dhandle Created: 28/Feb/22 Updated: 29/Oct/23 Resolved: 16/Mar/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 6.0.0-rc0 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Etienne Petrel | Assignee: | Jordi Olivares Provencio |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||||||||||
| Sprint: | Execution Team 2022-03-07, Execution Team 2022-03-21 | ||||||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||||||
| Description |
|
In However, In wiredtiger_util.cpp, there are a few places where we call alter and verify. The scope of the work is to update all places where those operations are executed and/or the tests by
The identified failing tests are visible in this patch build: where the following message appears:
To make sure the changes are ok, perform the required tests with the tip of WiredTiger develop:
The issues described in |
| Comments |
| Comment by Githook User [ 22/Mar/22 ] | |||
|
Author: {'name': 'Keith Bostic', 'email': 'keith.bostic@mongodb.com', 'username': 'keithbostic'}Message: Now that | |||
| Comment by Githook User [ 16/Mar/22 ] | |||
|
Author: {'name': 'Jordi Olivares Provencio', 'email': 'jordi.olivares-provencio@mongodb.com', 'username': 'jordiolivares'}Message: | |||
| Comment by Keith Bostic (Inactive) [ 07/Mar/22 ] | |||
|
jordi.olivares-provencio, | |||
| Comment by Keith Bostic (Inactive) [ 04/Mar/22 ] | |||
|
jordi.olivares-provencio, the problem here is that WiredTiger is leaving imported files configured for potential bulk load, and so checkpoints of the file aren't working correctly. I've created | |||
| Comment by Jordi Olivares Provencio [ 04/Mar/22 ] | |||
|
I've managed to slim down the code path we take into a reproducible error with the attached code and related files. I've tested it out with the same WT_SESSION and it seems to also fail. I don't know if the way we are using WiredTiger is correct in this case, but it seems as if there might be a bug. collection-4-4177432105351068428.wt index-5-4177432105351068428.wt To note: I tested this out by reverting the following commit: https://github.com/wiredtiger/wiredtiger/commit/515946e176cfd87066231d766e264a6ba068d034 | |||
| Comment by Keith Bostic (Inactive) [ 03/Mar/22 ] | |||
|
jordi.olivares-provencio, I agree, that shouldn't happen if the file is quiescent, that is, checkpoint should clear the modified flag. If there's a test case I can chase with you, please don't hesitate. Import is relatively new code, I believe, and I can imagine there's a bug there. | |||
| Comment by Keith Bostic (Inactive) [ 02/Mar/22 ] | |||
No; this only affects the WT_SESSION::alter, WT_SESSION::drop, WT_SESSION::rename, WT_SESSION::salvage, WT_SESSION::upgrade and WT_SESSION::verify methods. | |||
| Comment by Louis Williams [ 02/Mar/22 ] | |||
|
etienne.petrel/keith.bostic. There are certain statistics we collect, like WT_STAT_DSRC_BLOCK_REUSE_BYTES, that open dhandles. Does this change affect the collection of those statistics? | |||
| Comment by Jordi Olivares Provencio [ 02/Mar/22 ] | |||
|
The revert commit in order to debug this can be found here | |||
| Comment by Keith Bostic (Inactive) [ 01/Mar/22 ] | |||
|
louis.williams, jordi.olivares-provencio: There's a new ticket | |||
| Comment by Alexander Gorrod [ 01/Mar/22 ] | |||
|
Thanks for looping me in. I think MongoDB uses a different mechanism to actually get a WiredTiger checkpoint done via a utility thread - but the intention is entirely correct. | |||
| Comment by Keith Bostic (Inactive) [ 01/Mar/22 ] | |||
|
In places where it's reasonable to both attempt operations that require exclusive access, and have those operations fail because of dirty content in the cache, the following change should be sufficient:
In other words, checkpoint should allow the operation to succeed, and if it doesn’t then (1) something else must be dirtying the cache, and (2) further checkpoints are just racing with other threads, so it’s unclear when or if the op will ever succeed. I think when the second op returns EBUSY, then we probably want to dig deeper, and understand the reasoning — why are we trying to do something that requires exclusive access at the same time we’re using the object? As we understand further what's happening in these tests, it might be useful to update WT-8813, which lays out additional work in WiredTiger to improve this situation. cc: alexander.gorrod | |||
| Comment by Etienne Petrel [ 28/Feb/22 ] | |||
|
louis.williams, jordi.olivares-provencio, it would be great to have this done when possible, it is blocking us from updating the WT source in the MDB repo. |