-
Type:
Bug
-
Resolution: Done
-
Priority:
Major - P3
-
Affects Version/s: None
-
Component/s: None
-
None
-
Fully Compatible
-
ALL
-
None
-
None
-
None
-
None
-
None
-
None
-
None
444bb62c24 / Windows 64-bit 2008R2+ / jsCore_small_oplog
https://mci.10gen.com/ui/task_log_raw/mongodb_mongo_v2.6_windows_64_2k8_444bb62c24786a65334f4be214d1154dde00c71b_14_03_24_14_29_20_jsCore_small_oplog_windows_64_test?type=T
http://buildlogs.mongodb.org/mci_0.9_windows_64_2k8/builds/42921/test/jsCore_small_oplog_0/dropdb_race.js
In the small_oplog suite, this test issues the closeAllDatabases command on the master while the slave's getmore into oplog.rs is yielding. When the getmore returns from the yield, the dbtemprelease() destructor acquires a Context on the local database, which attempts to re-open the local database (which had been closed during the yield); this trips an massert, since opening a database requires a write lock. The slave gets back an error from the getmore call, and halts replication on the slave entirely (the "autoresync" server option is not set, so replication is never re-attempted). This prevents any further tests in the suite from running, as small_oplog waits for the slave to catch up to the master inbetween tests.
Assuming the above analysis is correct, possible remedies include any of:
- make closeAllDatabases not close "local"
- make dbtemprelease detect this failure
- turn on autoresync
- remove this test from the small_oplog suite
Test log excerpt:
2014-03-24T16:43:11.561+0000 [conn987] build index on: jstests_dur_droprace.foo properties: { v: 1, key: { x: 1.0 }, name: "x_1", ns: "jstests_dur_droprace.foo" }
2014-03-24T16:43:11.561+0000 [conn987] building index using bulk method
2014-03-24T16:43:11.561+0000 [conn987] build index done. scanned 3 total records. 0 secs
2014-03-24T16:43:11.562+0000 [conn987] DatabaseHolder::closeAll path:/data/db/sconsTests
2014-03-24T16:43:11.793+0000 [conn987] command admin.$cmd command: closeAllDatabases { closeAllDatabases: 1.0 } keyUpdates:0 numYields:0 locks(micros) W:231463 reslen:436 231ms
2014-03-24T16:43:11.793+0000 [conn4] opening db: local
2014-03-24T16:43:11.793+0000 [conn4] Assertion: 15927:can't open database in a read lock. if db was just closed, consider retrying the query. might otherwise indicate an internal error
2014-03-24T16:43:12.690+0000 [conn4] mongod.exe ...\src\mongo\util\stacktrace.cpp(169) mongo::printStackTrace+0x43
2014-03-24T16:43:12.690+0000 [conn4] mongod.exe ...\src\mongo\util\log.cpp(158) mongo::logContext+0x9c
2014-03-24T16:43:12.690+0000 [conn4] mongod.exe ...\src\mongo\util\assert_util.cpp(183) mongo::msgasserted+0xfb
2014-03-24T16:43:12.690+0000 [conn4] mongod.exe ...\src\mongo\db\catalog\database_holder.cpp(65) mongo::DatabaseHolder::getOrCreate+0x34e
2014-03-24T16:43:12.690+0000 [conn4] mongod.exe ...\src\mongo\db\client.cpp(357) mongo::Client::Context::_finishInit+0x85
2014-03-24T16:43:12.690+0000 [conn4] mongod.exe ...\src\mongo\db\db.h(70) mongo::dbtemprelease::~dbtemprelease+0x49
2014-03-24T16:43:12.691+0000 [conn4] mongod.exe ...\src\mongo\db\clientcursor.cpp(225) mongo::ClientCursor::staticYield+0x29b
2014-03-24T16:43:12.691+0000 [conn4] mongod.exe ...\src\mongo\db\query\plan_executor.cpp(83) mongo::PlanExecutor::getNext+0xf7
2014-03-24T16:43:12.691+0000 [conn4] mongod.exe ...\src\mongo\db\query\new_find.cpp(204) mongo::newGetMore+0x413
2014-03-24T16:43:12.691+0000 [conn4] mongod.exe ...\src\mongo\db\instance.cpp(730) mongo::receivedGetMore+0x67b
2014-03-24T16:43:12.691+0000 [conn4] mongod.exe ...\src\mongo\db\instance.cpp(436) mongo::assembleResponse+0x32c
2014-03-24T16:43:12.691+0000 [conn4] mongod.exe ...\src\mongo\db\db.cpp(204) mongo::MyMessageHandler::process+0x111
2014-03-24T16:43:12.691+0000 [conn4] mongod.exe ...\src\mongo\util\net\message_server_port.cpp(210) mongo::PortMessageServer::handleIncomingMsg+0x677
2014-03-24T16:43:12.691+0000 [conn4] mongod.exe ...\src\third_party\boost\libs\thread\src\win32\thread.cpp(185) boost::`anonymous namespace'::thread_start_function+0x21
2014-03-24T16:43:12.691+0000 [conn4] mongod.exe f:\dd\vctools\crt_bld\self_64_amd64\crt\src\threadex.c(314) _callthreadstartex+0x17
2014-03-24T16:43:12.691+0000 [conn4] mongod.exe f:\dd\vctools\crt_bld\self_64_amd64\crt\src\threadex.c(292) _threadstartex+0x7f
2014-03-24T16:43:12.691+0000 [conn4] kernel32.dll BaseThreadInitThunk+0xd
2014-03-24T16:43:12.691+0000 [conn4] local.oplog.$main
2014-03-24T16:43:12.692+0000 [conn987] note: not profiling because db went away - probably a close on: testing
2014-03-24T16:43:12.692+0000 [conn4] getmore local.oplog.$main cursorid:18746693299 ntoreturn:0 keyUpdates:0 exception: can't open database in a read lock. if db was just closed, consider retrying the query. might otherwise indicate an internal error code:15927 numYields:5 locks(micros) r:899043 nreturned:1 reslen:176 10025ms
2014-03-24T16:43:12.698+0000 [conn987] dropDatabase jstests_dur_droprace starting
2014-03-24T16:43:12.745+0000 [conn987] removeJournalFiles
2014-03-24T16:43:12.764+0000 [conn987] dropDatabase jstests_dur_droprace finished
2014-03-24T16:43:12.766+0000 [FileAllocator] allocating new datafile /data/db/sconsTests\jstests_dur_droprace.ns, filling with zeroes...
2014-03-24T16:43:13.880+0000 [FileAllocator] done allocating datafile /data/db/sconsTests\jstests_dur_droprace.ns, size: 16MB, took 1.114 secs
2014-03-24T16:43:13.881+0000 [FileAllocator] allocating new datafile /data/db/sconsTests\jstests_dur_droprace.0, filling with zeroes...
2014-03-24T16:43:19.602+0000 [FileAllocator] done allocating datafile /data/db/sconsTests\jstests_dur_droprace.0, size: 64MB, took 5.719 secs
2014-03-24T16:43:19.603+0000 [conn987] build index on: jstests_dur_droprace.foo properties: { v: 1, key: { _id: 1 }, name: "_id_", ns: "jstests_dur_droprace.foo" }
2014-03-24T16:43:19.603+0000 [conn987] added index to empty collection
2014-03-24T16:43:19.604+0000 [conn987] insert jstests_dur_droprace.foo query: { _id: ObjectId('533060a05e3a4439beec0f21') } ninserted:1 keyUpdates:0 numYields:0 locks(micros) w:6837265 6837ms
2014-03-24T16:43:19.604+0000 [conn987] command jstests_dur_droprace.$cmd command: insert { insert: "foo", documents: [ { _id: ObjectId('533060a05e3a4439beec0f21') } ], ordered: true } keyUpdates:0 numYields:0 locks(micros) w:55 reslen:56 6837ms
2014-03-24T16:43:19.611+0000 [conn987] build index on: jstests_dur_droprace.foo properties: { v: 1, key: { x: 1.0 }, name: "x_1", ns: "jstests_dur_droprace.foo" }
2014-03-24T16:43:19.611+0000 [conn987] building index using bulk method
2014-03-24T16:43:19.611+0000 [conn987] build index done. scanned 3 total records. 0 secs
2014-03-24T16:43:19.614+0000 [conn987] dropDatabase jstests_dur_droprace starting
2014-03-24T16:43:19.666+0000 [conn987] removeJournalFiles
2014-03-24T16:43:19.682+0000 [conn987] dropDatabase jstests_dur_droprace finished
2014-03-24T16:43:19.684+0000 [FileAllocator] allocating new datafile /data/db/sconsTests\jstests_dur_droprace.ns, filling with zeroes...
2014-03-24T16:43:12.692+0000 [replslave] op: { $err: "can't open database in a read lock. if db was just closed, consider retrying the query. might otherwise indicate an internal error", code: 15927 }
2014-03-24T16:43:12.692+0000 [replslave] halting replication
2014-03-24T16:43:12.692+0000 [replslave] caught SyncException
2014-03-24T16:43:12.692+0000 [replslave] repl: sleep 10 sec before next pass
2014-03-24T16:43:22.319+0000 [FileAllocator] done allocating datafile /data/db/sconsTests\jstests_dur_droprace.ns, size: 16MB, took 2.633 secs
2014-03-24T16:43:22.693+0000 [replslave] all sources dead: sync error: no ts found querying remote oplog record, sleeping for 5 seconds
2014-03-24T16:43:27.694+0000 [replslave] all sources dead: sync error: no ts found querying remote oplog record, sleeping for 5 seconds
Suite log excerpt:
[2014/03/24 16:43:02.400] ******************************************* [2014/03/24 16:43:02.404] Test : drop_index.js ... [2014/03/24 16:43:02.404] Command : ./mongo.exe --port 27999 --authenticationMechanism MONGODB-CR --writeMode commands --eval testingReplication = true; D:\data\mci\git@github.commongodb\mongo.git\mongodb-mongo-v2.6\jstests\core\drop_index.js [2014/03/24 16:43:02.404] Date : Mon Mar 24 16:43:02 2014 [2014/03/24 16:43:02.502] (output suppressed; see http://buildlogs.mongodb.org/build/53305ef0d2a60f71ad0009ef/test/53306096d2a60f732a0002da/) [2014/03/24 16:43:02.620] 219.9998 ms [2014/03/24 16:43:02.704] [2014/03/24 16:43:02.704] waiting for slave (28000) to catch up to master (27999) [2014/03/24 16:43:02.704] caught up! [2014/03/24 16:43:02.704] replication ok for 9750 collections [2014/03/24 16:43:02.704] skipping D:\data\mci\git@github.commongodb\mongo.git\mongodb-mongo-v2.6\jstests\core\dropdb.js [2014/03/24 16:43:02.708] waiting for slave (28000) to catch up to master (27999) [2014/03/24 16:43:02.708] caught up! [2014/03/24 16:43:02.708] replication ok for 9878 collections [2014/03/24 16:43:02.708] ******************************************* [2014/03/24 16:43:02.708] Test : dropdb_race.js ... [2014/03/24 16:43:02.708] Command : ./mongo.exe --port 27999 --authenticationMechanism MONGODB-CR --writeMode commands --eval testingReplication = true; D:\data\mci\git@github.commongodb\mongo.git\mongodb-mongo-v2.6\jstests\core\dropdb_race.js [2014/03/24 16:43:02.708] Date : Mon Mar 24 16:43:02 2014 [2014/03/24 16:43:02.817] (output suppressed; see http://buildlogs.mongodb.org/build/53305ef0d2a60f71ad0009ef/test/53306096d2a60f737900008e/) [2014/03/24 16:44:05.459] 1.0459 minutes [2014/03/24 16:49:10.494] timed out [2014/03/24 16:49:10.494] timed out [2014/03/24 16:49:10.494] [2014/03/24 16:49:16.531] 154 tests succeeded [2014/03/24 16:49:16.531] 591 tests didn't get run [2014/03/24 16:49:16.534] replication ok for 9878 collections [2014/03/24 16:49:16.624] timed out [2014/03/24 16:49:16.624] Traceback (most recent call last): [2014/03/24 16:49:16.624] File "buildscripts/smoke.py", line 1267, in <module> [2014/03/24 16:49:16.624] main() [2014/03/24 16:49:16.624] File "buildscripts/smoke.py", line 1251, in main [2014/03/24 16:49:16.624] run_tests(tests) [2014/03/24 16:49:16.624] File "buildscripts/smoke.py", line 696, in run_tests [2014/03/24 16:49:16.624] master.wait_for_repl() [2014/03/24 16:49:16.624] File "buildscripts/smoke.py", line 306, in wait_for_repl [2014/03/24 16:49:16.627] Connection(port=self.port).testing.smokeWait.insert({}, w=2, wtimeout=5*60*1000) [2014/03/24 16:49:16.627] File "C:\Python27\lib\site-packages\pymongo\collection.py", line 362, in insert [2014/03/24 16:49:16.627] self.database.connection) [2014/03/24 16:49:16.627] File "C:\Python27\lib\site-packages\pymongo\message.py", line 248, in _do_batched_insert [2014/03/24 16:49:16.627] client._send_message(_insert_message(EMPTY.join(data), safe), safe) [2014/03/24 16:49:16.627] File "C:\Python27\lib\site-packages\pymongo\mongo_client.py", line 969, in _send_message [2014/03/24 16:49:16.627] rv = self.__check_response_to_last_error(response) [2014/03/24 16:49:16.627] File "C:\Python27\lib\site-packages\pymongo\mongo_client.py", line 911, in __check_response_to_last_error [2014/03/24 16:49:16.627] raise OperationFailure(details["err"], details["code"]) [2014/03/24 16:49:16.627] pymongo.errors.OperationFailure: timeout [2014/03/24 16:49:16.656] Running script task for command ${killall_mci|pkill mongo; pkill -9 mongod; pkill -9 mongos; pkill -f buildlogger.py; pkill -f smoke.py} in directory /data/mci/git@github.commongodb/mongo.git/mongodb-mongo-v2.6 [2014/03/24 16:49:16.656] Full command: taskkill /IM mongod.exe /IM mongos.exe /IM mongo.exe /IM test.exe /IM buildlogger.py /IM smoke.py /IM python.exe /IM cl.exe /f [2014/03/24 16:49:16.658] Command successfully started and appended to running commands
- is related to
-
SERVER-13351 Slaves halt replication if oplog getmore returns $err
-
- Closed
-