|
13 successful test trials using r3.0.1-rc0
|
|
Testing with 085ad3aa18938262a6664ccda6b5fad509c1f68e on the v3.0 branch is going well. No invariants so far with three trials completed.
My plan is to conduct at least 12 trials to achieve a reasonable degree of confidence. Previously we would see the invariant in one out of three trials.
|
|
Saw same while shutting down a replica on rc9 after being up for a while and having a constant insert workload:
2015-03-03T18:08:39.142+0000 I - [signalProcessingThread] Invariant failure: _conn->close(_conn, config) resulted in status UnknownError 16: Device or resource busy at src/mongo/db/storage/wiredtiger/wiredtiger_kv_engine.cpp 206
|
2015-03-03T18:08:39.268+0000 I CONTROL [signalProcessingThread]
|
0xf5ae39 0xefb651 0xee25aa 0xd71ed7 0xa96bea 0xac6789 0xf5a026 0xfa7894 0x7fb114d65f18 0x7fb113822b2d
|
----- BEGIN BACKTRACE -----
|
{"backtrace":[{"b":"400000","o":"B5AE39"},{"b":"400000","o":"AFB651"},{"b":"400000","o":"AE25AA"},{"b":"400000","o":"971ED7"},{"b":"400000","o":"696BEA"},{"b":"400000","o":"6C6789"},{"b":"400000","o":"B5A026"},{"b":"400000","o":"BA7894"},{"b":"7FB114D5E000","o":"7F18"},{"b":"7FB113740000","o":"E2B2D"}],"processInfo":{ "mongodbVersion" : "3.0.0-rc9", "gitVersion" : "e6577bc37a2edba81b99146934cf7bad00c6e1b2", "uname" : { "sysname" : "Linux", "release" : "3.4.73-64.112.amzn1.x86_64", "version" : "#1 SMP Tue Dec 10 01:50:05 UTC 2013", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "0EC0EADFA8F4AD45610DC4515F6732A685D68742" }, { "b" : "7FFF06C10000", "elfType" : 3, "buildId" : "6A7E58423C94E707E8B4A45F098E78D557DD5A7D" }, { "b" : "7FB114D5E000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "A8E8015F84374807F6920DF95D16BBAAAC02AB41" }, { "b" : "7FB114AF1000", "path" : "/usr/lib64/libssl.so.10", "elfType" : 3, "buildId" : "A5C6FCE56D21542B9535D5FBCFE74EBA71D24613" }, { "b" : "7FB11470D000", "path" : "/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "75565934551BECB21A027BE0A3AE28A62AEEB7EE" }, { "b" : "7FB114505000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "CC3700E5EACA3A520D1FD1BA3B469F2EFB1B8F11" }, { "b" : "7FB114301000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "3D5EB706F159BCCB9A3B3E5F528B512E2C500F90" }, { "b" : "7FB113FF8000", "path" : "/usr/lib64/libstdc++.so.6", "elfType" : 3, "buildId" : "66F1CF311C61879639BD3DC0034DEE0D6D042261" }, { "b" : "7FB113CFA000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "226E54A756FDC288551092B4FDE818C8D6D6FC04" }, { "b" : "7FB113AE5000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "E77BA674F63D5C56373C03316B5E74C5C781A0BC" }, { "b" : "7FB113740000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "D40AD0538A2FFAB96E5540EF73816C675CC0AF5E" }, { "b" : "7FB114F7A000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "37F70DE2ABEA184F24518EB5CC538FF0F54E081F" }, { "b" : "7FB1134FD000", "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "9DF61878D8918F25CC74AD01F417FDB051DFE3DA" }, { "b" : "7FB113218000", "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "6F1DB0F811D1B210520443442D4437BC43BF9A80" }, { "b" : "7FB113015000", "path" : "/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "1A6E97644CC9149C2E1871C6AE1DB51975E78A41" }, { "b" : "7FB112DEA000", "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "F7DF34078FD7BFD684FE46D5F677EEDA1D9B9DC9" }, { "b" : "7FB112BD4000", "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "87B4EBF2183C8EA4AB657212203EFFE6340E2F4F" }, { "b" : "7FB1129C9000", "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "381960ACAB9C39461D58BDE7B272C4F61BB3582F" }, { "b" : "7FB1127C6000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "BF48CD5658DE95CE058C4B828E81C97E2AE19643" }, { "b" : "7FB1125AF000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "F01971089728188EFA4AA473D8B2DE22CA6DE963" }, { "b" : "7FB11238E000", "path" : "/usr/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "803D7EF21A989677D056E52BAEB9AB5B154FB9D9" } ] }}
|
mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf5ae39]
|
mongod(_ZN5mongo10logContextEPKc+0xE1) [0xefb651]
|
mongod(_ZN5mongo17invariantOKFailedEPKcRKNS_6StatusES1_j+0xDA) [0xee25aa]
|
mongod(_ZN5mongo18WiredTigerKVEngine13cleanShutdownEv+0x107) [0xd71ed7]
|
mongod(_ZN5mongo23GlobalEnvironmentMongoD34shutdownGlobalStorageEngineCleanlyEv+0x1A) [0xa96bea]
|
mongod(_ZN5mongo11exitCleanlyENS_8ExitCodeE+0x219) [0xac6789]
|
mongod(+0xB5A026) [0xf5a026]
|
mongod(+0xBA7894) [0xfa7894]
|
libpthread.so.0(+0x7F18) [0x7fb114d65f18]
|
libc.so.6(clone+0x6D) [0x7fb113822b2d]
|
----- END BACKTRACE -----
|
2015-03-03T18:08:39.268+0000 I - [signalProcessingThread]
|
|
***aborting after invariant() failure
|
Using rc9 Amazon build on Amazon Linux.3.4.73-64.112.amzn1.x86_64
|
|
Yes, just ^C. IIRC data was still being written (bulk insert/update via c# beta 2 driver) when the db was shutdown. I'll see if I can repro with another test.
|
|
I have been using the following javascript to "fill up" the wired tiger cache (watch top to see when RSS gets to about half the physical DRAM). After this mongo script finishes, send signal 2 or signal 15 to the mongod process. A few trials may be needed.
'use strict';
|
|
load('jstests/core/parallelTester.js');
|
|
function insert(myCollection, desired, randSeed) {
|
var batchSize = 1000
|
|
print('bulk loading ' + desired + ' documents to ' + myCollection + ' with batch size of ' + batchSize)
|
var count = 0
|
|
var mydb = db.getSiblingDB('sbtest')
|
|
var seed = randSeed
|
|
while (count < desired) {
|
var bulk = mydb.getCollection(myCollection).initializeUnorderedBulkOp();
|
for (var j=0; j < batchSize; count++, j++) {
|
var k = Math.round(random()*10000000)
|
var c = sysbenchString()
|
var pad = '' + Math.round(random()*100000000000)
|
for (var y = 0; y < 4; y++)
|
pad += '-' + Math.round(random()*100000000000)
|
var d = { _id: count, k: k, c: c, pad: pad }
|
//print(tojson(d))
|
bulk.insert(d)
|
}
|
bulk.execute();
|
}
|
|
// create indexes
|
print('creating index')
|
mydb.getCollection(myCollection).ensureIndex({k: 1})
|
|
function sysbenchString() {
|
var s = '' + Math.round(random()*100000000000)
|
for (var x = 0; x < 8; x++)
|
s += '-' + Math.round(random()*100000000000)
|
return s
|
}
|
|
// PRNG borrowed from http://stackoverflow.com/questions/521295/javascript-random-seeds
|
function random() {
|
var x = Math.sin(seed++) * 10000
|
return x - Math.floor(x)
|
}
|
|
}
|
|
|
function simulate_sysbench_load(num_collections, num_docs_per_collection) {
|
// data load phase
|
print("creating data for", num_collections, " collections")
|
|
// drop old collections
|
print('dropping old collections:')
|
var mydb = db.getSiblingDB('sbtest')
|
for (var offset=1; offset <= num_collections; offset++) {
|
var s = 'sbtest' + offset
|
print(s)
|
mydb.getCollection(s).drop()
|
}
|
|
var threads = []
|
|
print('\nloading new collections:')
|
var t = Math.round(Date.now() / num_collections)
|
for (var offset=0; offset < num_collections; offset++) {
|
var n = offset + 1
|
var s = 'sbtest' + n
|
threads[offset] = new ScopedThread(insert, s, num_docs_per_collection, (offset + 1) * t)
|
threads[offset].start()
|
}
|
|
print("all threads started")
|
threads.forEach(function(t) {
|
t.join();
|
});
|
threads = []
|
}
|
|
// load data
|
var collection_count = 16
|
var docs_per_collection = 20000000
|
simulate_sysbench_load(collection_count, docs_per_collection)
|
|
|
Thanks for the screenshot nickj. Are you able to reproduce the invariant failure? I do see the command line you used to launch the server, but how did you stop it? Just ^C?
Thanks,
Ramón.
|
|
Invariant.png: Windows 8.1 64bit rc10.
|
|
I have seen this on win 8.1 64-bit. Screenshot attached. 3.0 RC10.
|
|
Looks like I have one in rc9 (e6577bc37a2edba81b99146934cf7bad00c6e1b2) when delivered signal 2 (CTRL-C) with the cache full.
I was running my javascript-based sysbench-mongo workload (simulates sysbench but using javascript in the mongo shell).
2015-02-24T22:16:30.922+0000 I NETWORK [conn1248] end connection 127.0.0.1:43957 (10 connections now open)
|
2015-02-24T22:16:30.922+0000 I NETWORK [conn1250] end connection 127.0.0.1:43959 (9 connections now open)
|
2015-02-24T22:16:30.922+0000 I NETWORK [conn1251] end connection 127.0.0.1:43960 (9 connections now open)
|
2015-02-24T22:16:30.922+0000 I NETWORK [conn1249] end connection 127.0.0.1:43958 (9 connections now open)
|
2015-02-24T22:16:30.922+0000 I NETWORK [conn1252] end connection 127.0.0.1:43961 (3 connections now open)
|
2015-02-24T22:16:30.922+0000 I NETWORK [conn1253] end connection 127.0.0.1:43962 (3 connections now open)
|
2015-02-24T22:16:30.922+0000 I NETWORK [conn1254] end connection 127.0.0.1:43963 (3 connections now open)
|
2015-02-24T22:16:30.922+0000 I NETWORK [conn1255] end connection 127.0.0.1:43964 (2 connections now open)
|
^C2015-02-24T22:16:53.196+0000 I CONTROL [signalProcessingThread] got signal 2 (Interrupt), will terminate after current cmd ends
|
2015-02-24T22:16:53.197+0000 I CONTROL [signalProcessingThread] now exiting
|
2015-02-24T22:16:53.197+0000 I NETWORK [signalProcessingThread] shutdown: going to close listening sockets...
|
2015-02-24T22:16:53.197+0000 I NETWORK [signalProcessingThread] closing listening socket: 5
|
2015-02-24T22:16:53.197+0000 I NETWORK [signalProcessingThread] closing listening socket: 6
|
2015-02-24T22:16:53.197+0000 I NETWORK [signalProcessingThread] removing socket file: /tmp/mongodb-27017.sock
|
2015-02-24T22:16:53.197+0000 I NETWORK [signalProcessingThread] shutdown: going to flush diaglog...
|
2015-02-24T22:16:53.197+0000 I NETWORK [signalProcessingThread] shutdown: going to close sockets...
|
2015-02-24T22:16:53.197+0000 I STORAGE [signalProcessingThread] WiredTigerKVEngine shutting down
|
2015-02-24T22:16:56.195+0000 E STORAGE [signalProcessingThread] WiredTiger (0) [1424816216:195428][2998:0x7f3024dba700], log-server: Connection has open file handles: sizeStorer.wt
|
2015-02-24T22:16:56.195+0000 E STORAGE WiredTiger (0) [1424816216:195514][2998:0x7f30245b9700], eviction-server: cache server: exiting with 900287 pages in memory and 900285 pages evicted
|
2015-02-24T22:16:56.195+0000 E STORAGE WiredTiger (0) [1424816216:195547][2998:0x7f30245b9700], eviction-server: cache server: exiting with 12600 bytes in memory
|
2015-02-24T22:16:56.195+0000 E STORAGE WiredTiger (0) [1424816216:195570][2998:0x7f30245b9700], eviction-server: cache server: exiting with 12351 bytes dirty and 1 pages dirty
|
2015-02-24T22:16:56.198+0000 I - [signalProcessingThread] Invariant failure: _conn->close(_conn, config) resulted in status UnknownError 16: Device or resource busy at src/mongo/db/storage/wiredtiger/wiredtiger_kv_engine.cpp 206
|
2015-02-24T22:16:56.202+0000 I CONTROL [signalProcessingThread]
|
0xf41fa9 0xeec0d1 0xed20aa 0xd57057 0xa6acfa 0xa9abe9 0xf40e56 0xf8f754 0x7f3025d9a0a4 0x7f3024ea0ccd
|
----- BEGIN BACKTRACE -----
|
{"backtrace":[{"b":"400000","o":"B41FA9"},{"b":"400000","o":"AEC0D1"},{"b":"400000","o":"AD20AA"},{"b":"400000","o":"957057"},{"b":"400000","o":"66ACFA"},{"b":"400000","o":"69ABE9"},{"b":"400000","o":"B40E56"},{"b":"400000","o":"B8F754"},{"b":"7F3025D92000","o":"80A4"},{"b":"7F3024DBB000","o":"E5CCD"}],"processInfo":{ "mongodbVersion" : "3.0.0-rc9", "gitVersion" : "e6577bc37a2edba81b99146934cf7bad00c6e1b2", "uname" : { "sysname" : "Linux", "release" : "3.12.13-gentoo", "version" : "#5 SMP Sun May 4 12:39:57 CDT 2014", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000" }, { "b" : "7FFF850FF000", "path" : "linux-vdso.so.1", "elfType" : 3 }, { "b" : "7F3025D92000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3 }, { "b" : "7F3025B8A000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3 }, { "b" : "7F3025986000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3 }, { "b" : "7F302567B000", "path" : "/usr/lib/x86_64-linux-gnu/libstdc++.so.6", "elfType" : 3 }, { "b" : "7F302537A000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3 }, { "b" : "7F3025164000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3 }, { "b" : "7F3024DBB000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3 }, { "b" : "7F3025FAF000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3 } ] }}
|
mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf41fa9]
|
mongod(_ZN5mongo10logContextEPKc+0xE1) [0xeec0d1]
|
mongod(_ZN5mongo17invariantOKFailedEPKcRKNS_6StatusES1_j+0xDA) [0xed20aa]
|
mongod(_ZN5mongo18WiredTigerKVEngine13cleanShutdownEv+0x107) [0xd57057]
|
mongod(_ZN5mongo23GlobalEnvironmentMongoD34shutdownGlobalStorageEngineCleanlyEv+0x1A) [0xa6acfa]
|
mongod(_ZN5mongo11exitCleanlyENS_8ExitCodeE+0x219) [0xa9abe9]
|
mongod(+0xB40E56) [0xf40e56]
|
mongod(+0xB8F754) [0xf8f754]
|
libpthread.so.0(+0x80A4) [0x7f3025d9a0a4]
|
libc.so.6(clone+0x6D) [0x7f3024ea0ccd]
|
----- END BACKTRACE -----
|
2015-02-24T22:16:56.202+0000 I - [signalProcessingThread]
|
|
***aborting after invariant() failure
|
|
|
Seeing a very similar invariant, this time on _mdb_catalog.wt in an MCI run.
Logs here: http://buildlogs.mongodb.org/MCI_rhel55/builds/40566/test/slow2_WT_0/32bit.js
Full MCI run log: https://mci.10gen.com/task_log_raw/mongodb_mongo_v3.0_rhel55_ecffa7f73a4199848bffd276140781086c6a451d_15_02_18_05_43_49_slow2_WT_rhel55/0?type=T
2015-02-18T06:42:11.179+0000 E STORAGE [signalProcessingThread] WiredTiger (0) [1424241731:179101][13959:0x2af443a32940], log-server: Connection has open file handles: _mdb_catalog.wt
|
2015-02-18T06:42:11.179+0000 E STORAGE WiredTiger (0) [1424241731:179186][13959:0x2af444434940], eviction-server: cache server: exiting with 1089 pages in memory and 1087 pages evicted
|
2015-02-18T06:42:11.179+0000 E STORAGE WiredTiger (0) [1424241731:179222][13959:0x2af444434940], eviction-server: cache server: exiting with 9214 bytes in memory
|
2015-02-18T06:42:11.179+0000 E STORAGE WiredTiger (0) [1424241731:179247][13959:0x2af444434940], eviction-server: cache server: exiting with 8965 bytes dirty and 1 pages dirty
|
2015-02-18T06:42:11.180+0000 I - [signalProcessingThread] Invariant failure: _conn->close(_conn, config) resulted in status UnknownError 16: Device or resource busy at src/mongo/db/storage/wiredtiger/wiredtiger_kv_engine.cpp 206
|
2015-02-18T06:42:11.184+0000 I CONTROL [signalProcessingThread]
|
0xf56b19 0xef48b1 0xeda81a 0xd5f737 0xa71bfa 0xaa2909 0xf559c6 0xfa4514 0x2af44040183d 0x2af44182afcd
|
----- BEGIN BACKTRACE -----
|
{"backtrace":[{"b":"400000","o":"B56B19"},{"b":"400000","o":"AF48B1"},{"b":"400000","o":"ADA81A"},{"b":"400000","o":"95F737"},{"b":"400000","o":"671BFA"},{"b":"400000","o":"6A2909"},{"b":"400000","o":"B559C6"},{"b":"400000","o":"BA4514"},{"b":"2AF4403FB000","o":"683D"},{"b":"2AF441756000","o":"D4FCD"}],"processInfo":{ "mongodbVersion" : "3.0.0-rc9-pre-", "gitVersion" : "ecffa7f73a4199848bffd276140781086c6a451d", "uname" : { "sysname" : "Linux", "release" : "2.6.18-194.el5xen", "version" : "#1 SMP Tue Mar 16 22:01:26 EDT 2010", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000" }, { "b" : "2AF4403FB000", "path" : "/lib64/libpthread.so.0", "elfType" : 3 }, { "b" : "2AF440617000", "path" : "/lib64/libssl.so.6", "elfType" : 3 }, { "b" : "2AF440865000", "path" : "/lib64/libcrypto.so.6", "elfType" : 3 }, { "b" : "2AF440BB7000", "path" : "/lib64/librt.so.1", "elfType" : 3 }, { "b" : "2AF440DC0000", "path" : "/lib64/libdl.so.2", "elfType" : 3 }, { "b" : "2AF440FC4000", "path" : "/usr/lib64/libstdc++.so.6", "elfType" : 3 }, { "b" : "2AF4412C5000", "path" : "/lib64/libm.so.6", "elfType" : 3 }, { "b" : "2AF441548000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3 }, { "b" : "2AF441756000", "path" : "/lib64/libc.so.6", "elfType" : 3 }, { "b" : "2AF4401DD000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3 }, { "b" : "2AF441AB0000", "path" : "/usr/lib64/libgssapi_krb5.so.2", "elfType" : 3 }, { "b" : "2AF441CDE000", "path" : "/usr/lib64/libkrb5.so.3", "elfType" : 3 }, { "b" : "2AF441F73000", "path" : "/lib64/libcom_err.so.2", "elfType" : 3 }, { "b" : "2AF442176000", "path" : "/usr/lib64/libk5crypto.so.3", "elfType" : 3 }, { "b" : "2AF44239B000", "path" : "/lib64/libz.so.1", "elfType" : 3 }, { "b" : "2AF4425B0000", "path" : "/usr/lib64/libkrb5support.so.0", "elfType" : 3 }, { "b" : "2AF4427B8000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3 }, { "b" : "2AF4429BA000", "path" : "/lib64/libresolv.so.2", "elfType" : 3 }, { "b" : "2AF442BD0000", "path" : "/lib64/libselinux.so.1", "elfType" : 3 }, { "b" : "2AF442DE8000", "path" : "/lib64/libsepol.so.1", "elfType" : 3 } ] }}
|
mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf56b19]
|
mongod(_ZN5mongo10logContextEPKc+0xE1) [0xef48b1]
|
mongod(_ZN5mongo17invariantOKFailedEPKcRKNS_6StatusES1_j+0xDA) [0xeda81a]
|
mongod(_ZN5mongo18WiredTigerKVEngine13cleanShutdownEv+0x107) [0xd5f737]
|
mongod(_ZN5mongo23GlobalEnvironmentMongoD34shutdownGlobalStorageEngineCleanlyEv+0x1A) [0xa71bfa]
|
mongod(_ZN5mongo11exitCleanlyENS_8ExitCodeE+0x219) [0xaa2909]
|
mongod(+0xB559C6) [0xf559c6]
|
mongod(+0xBA4514) [0xfa4514]
|
libpthread.so.0(+0x683D) [0x2af44040183d]
|
libc.so.6(clone+0x6D) [0x2af44182afcd]
|
----- END BACKTRACE -----
|
2015-02-18T06:42:11.184+0000 I - [signalProcessingThread]
|
***aborting after invariant() failure
|
|
|
I just noticed that we're grabbing _cursorMutex in ~WiredTigerSizeStorer, so that should prevent this – the reset could above would block until thread 2's syncSizeInfo completed. So there must be something else going on...
One thing that is unusual about sizeStorer is that WiredTiger keeps thinking it is idle and closing the underlying file: maybe there is some bad interaction between that code and connection close.
|
|
michael.cahill Clean shutdown requires a global X lock which would prevent the racing insert. The sessionCache mutex was needed because sessions are returned to the cache after their thread has released all of its locks. I think other than session (RecoveryUnit) creation and destruction, no other interaction with the storage engine should be possible without at least an IS lock which conflicts with shutdown's X lock.
|
|
Hi redbeard0531, here's the sequence I'm concerned about:
Thread 1 Thread 2
|
cleanShutdown
|
syncSizeInfo syncSizeInfo [blocks]
|
[gets mutex, checks magic]
|
_sizeStorer.reset ...
|
...
|
_conn->close
|
_cursor->insert
|
There's nothing obviously wrong with the patch, but I believe that previously it acquired _shutdownLock implicitly in syncSizeInfo as part of getting a session, preventing the race above.
|
|
michael.cahill I don't think that is possible. The destructor is guarded by both a mutex and a "magic flag": https://github.com/mongodb/mongo/blob/master/src/mongo/db/storage/wiredtiger/wiredtiger_size_storer.cpp#L72-76. Neither of which should be necessary, but should ensure that this would be impossible even if we screwed something up. Note that the sizeStorer is destroyed before the connection is closed: https://github.com/mongodb/mongo/blob/master/src/mongo/db/storage/wiredtiger/wiredtiger_kv_engine.cpp#L198-206. This was changed to be this way in https://github.com/mongodb/mongo/commit/79492d9cc1885d74b31b5fe24194dbc227096d6e which happens to be the commit where this failed.
I'll take a look tomorrow to see if there is any other way that an errant syncCache could be happening at shutdown, but I'd expect a different error (or crash) when using a closed cursor. Do you have any other theories? Is there anything obviously wrong in this patch: https://github.com/mongodb/mongo/commit/3eeddde0149962a3e3ed3656da7f851d17028ec7#diff-395b373fbfa1755cb5f50b30da24e670?
|
|
The main clue here is this message:
Connection has open file handles: sizeStorer.wt
|
What I believe that indicates is that a sizeStorer syncCache ran concurrently with WT_CONNECTION::close. There is WiredTigerSessionCache::_shutdownLock to prevent regular operations racing in this way: I think we need the same thing for sizeStorer, which doesn't go through the session cache.
|
|
michael.cahill, note the log messages from the eviction server.
|
|
logs and telemetry used with support-tools/timeseries
|
Generated at Thu Feb 08 03:44:01 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.