[SERVER-79869] The WiredTiger event handler is not used correctly when calling compact Created: 09/Aug/23  Updated: 29/Oct/23  Resolved: 10/Aug/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 7.1.0-rc0

Type: Bug Priority: Major - P3
Reporter: Etienne Petrel Assignee: Benety Goh
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-75814 MDB compact commands running in testi... Open
is related to WT-11434 Check if foreground compaction has be... Closed
is related to SERVER-70201 Make compact killable Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

Include WT-11434 in MongoDB and execute the storage_wiredtiger_test test suite.

Sprint: Execution NAMR Team 2023-08-21
Participants:

 Description   

The WiredTiger team tried to perform a drop yesterday, see the patch build which contains failures related to compact testing in MDB.

When compaction is performed, we check periodically if it has been interrupted by the application or if the timeout (if configured) has elapsed, see here.

With WT-11434, we are now doing those checks before taking a checkpoint as part of compaction, see here.

To know whether compaction has been interrupted by the application, we check the event handler.

There was some work done related to this through SERVER-70201 and the issue is in this new code here:

    OperationContext* opCtx = reinterpret_cast<OperationContext*>(session->app_private);
    invariant(opCtx);

The session->app_private field is set only if a SessionDataRAII is constructed and used for the compact operation, which is the case here.

However, there is another call to compact in the MDB layer such as in wiredtiger_util_test.cpp that is not using the SessionDataRAII so when the event handler is checked, it crashes with the invariant. There might be other calls to compact that would lead to the issue.

After checking why it was not failing prior to WT-11434, the code that triggers the event handler is not executed and this might be related to SERVER-75814.

The suggested code changes are the following ones:

diff --git a/src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp b/src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp
index bda44e1e9ac..11e94fde9f9 100644
--- a/src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp
+++ b/src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp
@@ -862,12 +862,11 @@ int mdb_handle_general(WT_EVENT_HANDLER* handler,
                        WT_SESSION* session,
                        WT_EVENT_TYPE type,
                        void* arg) {
-    if (type != WT_EVENT_COMPACT_CHECK) {
+    if (type != WT_EVENT_COMPACT_CHECK || session == nullptr || session->app_private == nullptr) {
         return 0;
     }
 
     OperationContext* opCtx = reinterpret_cast<OperationContext*>(session->app_private);
-    invariant(opCtx);
 
     Status status = opCtx->checkForInterruptNoAssert();
     if (!status.isOK()) {



 Comments   
Comment by Githook User [ 10/Aug/23 ]

Author:

{'name': 'Etienne Petrel', 'email': 'etienne.petrel@mongodb.com', 'username': 'etienneptl'}

Message: SERVER-79869 fix WiredTiger event handler when calling compact
Branch: master
https://github.com/mongodb/mongo/commit/8fa3571a8df859496a420730683cbb7d2a2289d1

Comment by Etienne Petrel [ 09/Aug/23 ]

benety.goh@mongodb.com, gregory.wlodarek@mongodb.com, can you have a look, please? It is blocking the WiredTiger drop and we would like not to revert the code changes as they are correct. If you are not convinced with the suggested solution, I am happy to discuss more and if we can disable the failing tests temporarily, it could be great too! Thank you.

cc: y.ershov@mongodb.com, sean.watt@mongodb.com

Generated at Thu Feb 08 06:42:07 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.