[SERVER-63605] validate_tests fails after some WT changes Created: 14/Feb/22  Updated: 29/Oct/23  Resolved: 23/Feb/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 6.0.0-rc0, 5.3.0-rc2

Type: Bug Priority: Major - P3
Reporter: Etienne Petrel Assignee: Jordi Olivares Provencio
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
is depended on by WT-8695 Remove file_close_sync config and dis... Closed
Related
related to WT-8883 Change to disallow single file checkp... Closed
related to SERVER-64026 Update WT operations that require exc... Closed
related to WT-8892 Disallow single file checkpoint from ... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v5.3
Steps To Reproduce:
  • Check out the latest MDB commit
  • Check out the following commit of WT that corresponds to WT-8695: 2d48d69564f24dce5edf3384431b617194864a27
  • Import the src files of wiredtiger into mongo/src/third_party/wiredtiger
    • cp -r $WT_HOME/src/* $MDB_HOME/src/third_party/wiredtiger/src/
    • cp -r $WT_HOME/dist/* $MDB_HOME/src/third_party/wiredtiger/dist/
  • Compile MDB
    • ninja -j400 install-all
  • Execute validate_test
    • cd ./build/opt/install/bin/
    • ./dbtest validate_tests --dur --enableMajorityReadConcern=True --storageEngine=wiredTiger
Sprint: Execution Team 2022-03-07
Participants:
Linked BF Score: 65

 Description   

We recently merged WT-8695 (had to revert it because it generates fallouts on MDB side).
We saw the issue on run_dbtest on ! Shared Library Linux DEBUG. The test validate_test failed with an unexpected number of warnings found:

{"namespace":"unittests.validate_tests","ident":"collection-158-6794401668041455030","commitTimestamp":{"$timestamp":{"t":0,"i":0}}}}
[db_test:validate_tests] {"t":{"$date":"2022-02-14T00:06:47.740Z"},"s":"I",  "c":"TEST",     "id":4680100, "ctx":"testsuite","msg":"FAIL","attr":{"test":"ValidateTests::ValidateMissingAndExtraIndexEntryResults<false, false>","type":"TestAssertionFailureException","error":"Expected static_cast<size_t>(2) == results.warnings.size() (2 == 5)...



 Comments   
Comment by Githook User [ 25/Feb/22 ]

Author:

{'name': 'Jordi Olivares Provencio', 'email': 'jordi.olivares-provencio@mongodb.com', 'username': 'jordiolivares'}

Message: SERVER-63605 Modify test to ignore transient warnings

(cherry picked from commit 2fca2d4e19a642fd97374dfde65bc04799b73f58)
Branch: v5.3
https://github.com/mongodb/mongo/commit/da453c7188401772c03dc59d7ce417fcf86459bd

Comment by Githook User [ 23/Feb/22 ]

Author:

{'name': 'Jordi Olivares Provencio', 'email': 'jordi.olivares-provencio@mongodb.com', 'username': 'jordiolivares'}

Message: SERVER-63605 Modify test to ignore transient warnings
Branch: master
https://github.com/mongodb/mongo/commit/2fca2d4e19a642fd97374dfde65bc04799b73f58

Comment by Luke Chen [ 23/Feb/22 ]

As this ticket is a dependency of WT-8695 (a 5.3 blocker), I added "5.3 Required" fixversion and requested the backport for this ticket. It would be great it can be resolved (and backported) soon - the 5.3 code freeze is in 2 weeks' time. 

Comment by Keith Bostic (Inactive) [ 22/Feb/22 ]

Thank you, louis.williams.

If you're OK changing the test to ignore those warnings, that would be great!

Comment by Louis Williams [ 22/Feb/22 ]

The unit test failures appear to be caused by the test receiving more warnings from validate than expected. And this is due to the following warnings: "Could not complete validation of table:collection-306-6794401668041455030. This is a transient issue as the collection was actively in use by other operations."

We return this as a warning when we get an EBUSY error validating the tables, so this is entirely expected.

We can either change the tests to ignore these messages, or change validate to use the code Chenhao provided. The latter would probably be too consequential since we run validate frequently in testing, and these tests would probably take much longer to complete.

Comment by Keith Bostic (Inactive) [ 21/Feb/22 ]

geert.bosch, louis.williams, could we get someone to take a fast look at SERVER-63605? We’re hoping this is just a small test change and it’s blocking a change we’d like to get pushed: WT-8695.

The summary is we’re going to stop allowing operations like alter, or verify, that require exclusive access to files, to proceed if there is dirty data in the cache. We historically checkpointed the file to flush the dirty data from the cache to allow exclusive access, but that leads to data inconsistency as the history-store file is not checkpointed along with the file.

cc: alexander.gorrod

Comment by Chenhao Qu [ 14/Feb/22 ]

WiredTiger intends to disallow single file checkpoint in WT-8695. This means all the operations that require exclusive access to a dhandle may return ebusy if the tree is dirty. The application should keep doing database wide checkpoints before retrying these operations. The server team should fix all the tests that assumes drop, salvage, alter, rename, verify, and upgrade will succeed.

The code should look like:

while (session->verify(session) == EBUSY)
       session->checkpoint(session);

Comment by Etienne Petrel [ 14/Feb/22 ]

geert.bosch, would your team be the right one to work on this ticket?

chenhao.qu, can you provide more context about the intent of WT-8695?

Generated at Thu Feb 08 05:58:12 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.