-
Type:
Bug
-
Resolution: Won't Fix
-
Priority:
Critical - P2
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
None
Creating this to track a possible bug identified in a patch build, the patch was trying to repro the WT_NOTFOUND bug thats been hanging around.
Essentially I created a mongodb patch build with this diff (applied on top of v4.4):
src/mongo/db/storage/wiredtiger/wiredtiger_record_store.cpp
diff --git a/src/mongo/db/storage/wiredtiger/wiredtiger_record_store.cpp b/src/mongo/db/storage/wiredtiger/wiredtiger_record_store.cpp
index f1d127e774..c74f7503b1 100644
--- a/src/mongo/db/storage/wiredtiger/wiredtiger_record_store.cpp
+++ b/src/mongo/db/storage/wiredtiger/wiredtiger_record_store.cpp
@@ -1597,6 +1597,15 @@ Status WiredTigerRecordStore::updateRecord(OperationContext* opCtx,
invariant(c);
setKey(c, id);
int ret = wiredTigerPrepareConflictRetry(opCtx, [&] { return c->search(c); });
+ if (ret == 0) {
+ std::int64_t key;
+ c->get_key(c, &key);
+ c->flags = c->flags | WT_CURSTD_DEBUG_RESET_EVICT;
+ c->reset(c);
+ c->set_key(c, key);
+ c->flags = c->flags & ~WT_CURSTD_DEBUG_RESET_EVICT;
+ ret = wiredTigerPrepareConflictRetry(opCtx, [&] { return c->search(c); });
+ }
invariantWTOK(ret);
WT_ITEM old_value;
I then patch built it on mongodb-mongo-v4.4 and it reproduced a segfault and a WT_NOTFOUND.
[cpp_unit_test:storage_wiredtiger_prefixed_record_store_and_index_test] 2020-11-10T10:18:57.291+0000 | 2020-11-10T10:18:57.291Z I STORAGE 4795906 [main] "WiredTiger opened","attr":{"durationMillis":19}
[cpp_unit_test:storage_wiredtiger_prefixed_record_store_and_index_test] 2020-11-10T10:18:57.291+0000 | 2020-11-10T10:18:57.291Z I RECOVERY 23987 [main] "WiredTiger recoveryTimestamp","attr":{"recoveryTimestamp":{"$timestamp":{"t":0,"i":0}}}
[cpp_unit_test:storage_wiredtiger_prefixed_record_store_and_index_test] 2020-11-10T10:18:57.304+0000 | 2020-11-10T10:18:57.304Z F - 23083 [main] "Invariant failure","attr":{"expr":"ret","error":"UnknownError: -31803: WT_NOTFOUND: item not found","file":"src/mongo/db/storage/wiredtiger/wiredtiger_record_store.cpp","line":1609}
[cpp_unit_test:storage_wiredtiger_prefixed_record_store_and_index_test] 2020-11-10T10:18:57.304+0000 | 2020-11-10T10:18:57.304Z F - 23084 [main] "\n\n***aborting after invariant() failure\n\n"
[cpp_unit_test:storage_wiredtiger_prefixed_record_store_and_index_test] 2020-11-10T10:18:57.304+0000 | 2020-11-10T10:18:57.304Z F CONTROL 4757800 [main] "Writing fatal message","attr":{"message":"Got signal: 6 (Aborted).\n"}
[cpp_unit_test:storage_wiredtiger_prefixed_record_store_and_index_test] 2020-11-10T11:42:06.784+0000 | 2020-11-10T11:42:06.783Z I STORAGE 4795906 [main] "WiredTiger opened","attr":{"durationMillis":22}
[cpp_unit_test:storage_wiredtiger_prefixed_record_store_and_index_test] 2020-11-10T11:42:06.785+0000 | 2020-11-10T11:42:06.783Z I RECOVERY 23987 [main] "WiredTiger recoveryTimestamp","attr":{"recoveryTimestamp":{"$timestamp":{"t":0,"i":0}}}
[cpp_unit_test:storage_wiredtiger_prefixed_record_store_and_index_test] 2020-11-10T11:42:06.799+0000 | 2020-11-10T11:42:06.799Z F CONTROL 4757800 [main] "Writing fatal message","attr":{"message":"Invalid access at address: 0x13e400000000"}
[cpp_unit_test:storage_wiredtiger_prefixed_record_store_and_index_test] 2020-11-10T11:42:06.799+0000 | 2020-11-10T11:42:06.799Z F CONTROL 4757800 [main] "Writing fatal message","attr":{"message":"Got signal: 11 (Segmentation fault).\n"}
The scenario seen in this test doesn't feel like the same as the ones we're investigating. The failing test is a unit test: storage_wiredtiger_prefixed_record_store_and_index_test