[SERVER-64026] Update WT operations that require exclusive access to a dhandle Created: 28/Feb/22  Updated: 29/Oct/23  Resolved: 16/Mar/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 6.0.0-rc0

Type: Improvement Priority: Major - P3
Reporter: Etienne Petrel Assignee: Jordi Olivares Provencio
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File collection-4-4177432105351068428.wt     File index-5-4177432105351068428.wt     File main.cpp    
Issue Links:
Depends
depends on WT-8910 Imported files should not remain conf... Closed
is depended on by WT-8892 Disallow single file checkpoint from ... Closed
Problem/Incident
is caused by WT-8695 Remove file_close_sync config and dis... Closed
Related
related to WT-8883 Change to disallow single file checkp... Closed
related to SERVER-64726 Modify db_catalog_test to account for... Closed
is related to SERVER-63605 validate_tests fails after some WT ch... Closed
is related to WT-8813 Improve access to methods requiring a... Backlog
Backwards Compatibility: Fully Compatible
Sprint: Execution Team 2022-03-07, Execution Team 2022-03-21
Participants:

 Description   

In SERVER-63605, we aligned the tests with the changes done in WT-8695.

However, WT-8695 still causes issues in other places where we perform operations that require exclusive access to a dhandle: drop, salvage, alter, rename, verify, and upgrade.

In wiredtiger_util.cpp, there are a few places where we call alter and verify.

The scope of the work is to update all places where those operations are executed and/or the tests by

The identified failing tests are visible in this patch build: where the following message appears:

Device or resource busy 

To make sure the changes are ok, perform the required tests with the tip of WiredTiger develop:

cp -r $WT_HOME/src/* $MDB_HOME/src/third_party/wiredtiger/src/
cp -r $WT_HOME/dist/* $MDB_HOME/src/third_party/wiredtiger/dist/ 

The issues described in WT-8883 should no longer appear after the changes are done.



 Comments   
Comment by Githook User [ 22/Mar/22 ]

Author:

{'name': 'Keith Bostic', 'email': 'keith.bostic@mongodb.com', 'username': 'keithbostic'}

Message: WT-8892 Disallow single file checkpoint from MDB server (#7689)

Now that SERVER-64026 has committed, make the standalone build behavior of disallowing single object checkpoints the behavior for all builds.
Branch: develop
https://github.com/wiredtiger/wiredtiger/commit/13d69ff4a172fd902fad1fdd23e9997a0145e7b0

Comment by Githook User [ 16/Mar/22 ]

Author:

{'name': 'Jordi Olivares Provencio', 'email': 'jordi.olivares-provencio@mongodb.com', 'username': 'jordiolivares'}

Message: SERVER-64026 Update WT operations that require exclusive access to a dhandle
Branch: master
https://github.com/mongodb/mongo/commit/dfe7541231e794887be35eff95482285f6e21f35

Comment by Keith Bostic (Inactive) [ 07/Mar/22 ]

jordi.olivares-provencio, WT-8910 has been merged into develop, so once it's dropped into master, you should be good to go.

Comment by Keith Bostic (Inactive) [ 04/Mar/22 ]

jordi.olivares-provencio, the problem here is that WiredTiger is leaving imported files configured for potential bulk load, and so checkpoints of the file aren't working correctly. I've created WT-8910 to fix the problem.

Comment by Jordi Olivares Provencio [ 04/Mar/22 ]

keith.bostic

I've managed to slim down the code path we take into a reproducible error with the attached code and related files. I've tested it out with the same WT_SESSION and it seems to also fail. I don't know if the way we are using WiredTiger is correct in this case, but it seems as if there might be a bug.

main.cpp

collection-4-4177432105351068428.wt

index-5-4177432105351068428.wt

To note: I tested this out by reverting the following commit: https://github.com/wiredtiger/wiredtiger/commit/515946e176cfd87066231d766e264a6ba068d034

Comment by Keith Bostic (Inactive) [ 03/Mar/22 ]

jordi.olivares-provencio, I agree, that shouldn't happen if the file is quiescent, that is, checkpoint should clear the modified flag. If there's a test case I can chase with you, please don't hesitate. Import is relatively new code, I believe, and I can imagine there's a bug there.

Comment by Keith Bostic (Inactive) [ 02/Mar/22 ]

louis.williams

There are certain statistics we collect, like WT_STAT_DSRC_BLOCK_REUSE_BYTES, that open dhandles. Does this change affect the collection of those statistics?

No; this only affects the WT_SESSION::alter, WT_SESSION::drop, WT_SESSION::rename, WT_SESSION::salvage, WT_SESSION::upgrade and WT_SESSION::verify methods.

Comment by Louis Williams [ 02/Mar/22 ]

etienne.petrel/keith.bostic. There are certain statistics we collect, like WT_STAT_DSRC_BLOCK_REUSE_BYTES, that open dhandles. Does this change affect the collection of those statistics?

Comment by Jordi Olivares Provencio [ 02/Mar/22 ]

The revert commit in order to debug this can be found here

Comment by Keith Bostic (Inactive) [ 01/Mar/22 ]

louis.williams, jordi.olivares-provencio: WT-8883 has been merged.

There's a new ticket WT-8892, if/when SERVER-64026 is closed, that ticket should be scheduled to clean up the WiredTiger side.

Comment by Alexander Gorrod [ 01/Mar/22 ]

Thanks for looping me in. I think MongoDB uses a different mechanism to actually get a WiredTiger checkpoint done via a utility thread - but the intention is entirely correct.

Comment by Keith Bostic (Inactive) [ 01/Mar/22 ]

In places where it's reasonable to both attempt operations that require exclusive access, and have those operations  fail because of dirty content in the cache, the following change should be sufficient:

if (op == EBUSY)
    session.checkpoint()
return (op); // further checkpoints not likely to help

In other words, checkpoint should allow the operation to succeed, and if it doesn’t then (1) something else must be dirtying the cache, and (2) further checkpoints are just racing with other threads, so it’s unclear when or if the op will ever succeed.

I think when the second op returns EBUSY, then we probably want to dig deeper, and understand the reasoning — why are we trying to do something that requires exclusive access at the same time we’re using the object?

As we understand further what's happening in these tests, it might be useful to update WT-8813, which lays out additional work in WiredTiger to improve this situation.

cc: alexander.gorrod

Comment by Etienne Petrel [ 28/Feb/22 ]

louis.williams, jordi.olivares-provencio, it would be great to have this done when possible, it is blocking us from updating the WT source in the MDB repo.
Also, would you know why we did not hit those issues when SERVER-63605 was completed?

Generated at Thu Feb 08 05:59:19 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.