-
Type: Bug
-
Resolution: Won't Fix
-
Priority: Critical - P2
-
None
-
Affects Version/s: None
-
Component/s: None
Creating this to track a possible bug identified in a patch build, the patch was trying to repro the WT_NOTFOUND bug thats been hanging around.
Essentially I created a mongodb patch build with this diff (applied on top of v4.4):
src/mongo/db/storage/wiredtiger/wiredtiger_record_store.cpp diff --git a/src/mongo/db/storage/wiredtiger/wiredtiger_record_store.cpp b/src/mongo/db/storage/wiredtiger/wiredtiger_record_store.cpp index f1d127e774..c74f7503b1 100644 --- a/src/mongo/db/storage/wiredtiger/wiredtiger_record_store.cpp +++ b/src/mongo/db/storage/wiredtiger/wiredtiger_record_store.cpp @@ -1597,6 +1597,15 @@ Status WiredTigerRecordStore::updateRecord(OperationContext* opCtx, invariant(c); setKey(c, id); int ret = wiredTigerPrepareConflictRetry(opCtx, [&] { return c->search(c); }); + if (ret == 0) { + std::int64_t key; + c->get_key(c, &key); + c->flags = c->flags | WT_CURSTD_DEBUG_RESET_EVICT; + c->reset(c); + c->set_key(c, key); + c->flags = c->flags & ~WT_CURSTD_DEBUG_RESET_EVICT; + ret = wiredTigerPrepareConflictRetry(opCtx, [&] { return c->search(c); }); + } invariantWTOK(ret); WT_ITEM old_value;
I then patch built it on mongodb-mongo-v4.4 and it reproduced a segfault and a WT_NOTFOUND.
[cpp_unit_test:storage_wiredtiger_prefixed_record_store_and_index_test] 2020-11-10T10:18:57.291+0000 | 2020-11-10T10:18:57.291Z I STORAGE 4795906 [main] "WiredTiger opened","attr":{"durationMillis":19} [cpp_unit_test:storage_wiredtiger_prefixed_record_store_and_index_test] 2020-11-10T10:18:57.291+0000 | 2020-11-10T10:18:57.291Z I RECOVERY 23987 [main] "WiredTiger recoveryTimestamp","attr":{"recoveryTimestamp":{"$timestamp":{"t":0,"i":0}}} [cpp_unit_test:storage_wiredtiger_prefixed_record_store_and_index_test] 2020-11-10T10:18:57.304+0000 | 2020-11-10T10:18:57.304Z F - 23083 [main] "Invariant failure","attr":{"expr":"ret","error":"UnknownError: -31803: WT_NOTFOUND: item not found","file":"src/mongo/db/storage/wiredtiger/wiredtiger_record_store.cpp","line":1609} [cpp_unit_test:storage_wiredtiger_prefixed_record_store_and_index_test] 2020-11-10T10:18:57.304+0000 | 2020-11-10T10:18:57.304Z F - 23084 [main] "\n\n***aborting after invariant() failure\n\n" [cpp_unit_test:storage_wiredtiger_prefixed_record_store_and_index_test] 2020-11-10T10:18:57.304+0000 | 2020-11-10T10:18:57.304Z F CONTROL 4757800 [main] "Writing fatal message","attr":{"message":"Got signal: 6 (Aborted).\n"} [cpp_unit_test:storage_wiredtiger_prefixed_record_store_and_index_test] 2020-11-10T11:42:06.784+0000 | 2020-11-10T11:42:06.783Z I STORAGE 4795906 [main] "WiredTiger opened","attr":{"durationMillis":22} [cpp_unit_test:storage_wiredtiger_prefixed_record_store_and_index_test] 2020-11-10T11:42:06.785+0000 | 2020-11-10T11:42:06.783Z I RECOVERY 23987 [main] "WiredTiger recoveryTimestamp","attr":{"recoveryTimestamp":{"$timestamp":{"t":0,"i":0}}} [cpp_unit_test:storage_wiredtiger_prefixed_record_store_and_index_test] 2020-11-10T11:42:06.799+0000 | 2020-11-10T11:42:06.799Z F CONTROL 4757800 [main] "Writing fatal message","attr":{"message":"Invalid access at address: 0x13e400000000"} [cpp_unit_test:storage_wiredtiger_prefixed_record_store_and_index_test] 2020-11-10T11:42:06.799+0000 | 2020-11-10T11:42:06.799Z F CONTROL 4757800 [main] "Writing fatal message","attr":{"message":"Got signal: 11 (Segmentation fault).\n"}
The scenario seen in this test doesn't feel like the same as the ones we're investigating. The failing test is a unit test: storage_wiredtiger_prefixed_record_store_and_index_test