Details
-
Bug
-
Status: Closed
-
Major - P3
-
Resolution: Fixed
-
None
-
None
-
None
-
Fully Compatible
-
ALL
Description
444bb62c24 / Windows 64-bit 2008R2+ / jsCore_small_oplog
https://mci.10gen.com/ui/task_log_raw/mongodb_mongo_v2.6_windows_64_2k8_444bb62c24786a65334f4be214d1154dde00c71b_14_03_24_14_29_20_jsCore_small_oplog_windows_64_test?type=T
http://buildlogs.mongodb.org/mci_0.9_windows_64_2k8/builds/42921/test/jsCore_small_oplog_0/dropdb_race.js
In the small_oplog suite, this test issues the closeAllDatabases command on the master while the slave's getmore into oplog.rs is yielding. When the getmore returns from the yield, the dbtemprelease() destructor acquires a Context on the local database, which attempts to re-open the local database (which had been closed during the yield); this trips an massert, since opening a database requires a write lock. The slave gets back an error from the getmore call, and halts replication on the slave entirely (the "autoresync" server option is not set, so replication is never re-attempted). This prevents any further tests in the suite from running, as small_oplog waits for the slave to catch up to the master inbetween tests.
Assuming the above analysis is correct, possible remedies include any of:
- make closeAllDatabases not close "local"
- make dbtemprelease detect this failure
- turn on autoresync
- remove this test from the small_oplog suite
Test log excerpt:
2014-03-24T16:43:11.561+0000 [conn987] build index on: jstests_dur_droprace.foo properties: { v: 1, key: { x: 1.0 }, name: "x_1", ns: "jstests_dur_droprace.foo" }
|
2014-03-24T16:43:11.561+0000 [conn987] building index using bulk method
|
2014-03-24T16:43:11.561+0000 [conn987] build index done. scanned 3 total records. 0 secs
|
2014-03-24T16:43:11.562+0000 [conn987] DatabaseHolder::closeAll path:/data/db/sconsTests
|
2014-03-24T16:43:11.793+0000 [conn987] command admin.$cmd command: closeAllDatabases { closeAllDatabases: 1.0 } keyUpdates:0 numYields:0 locks(micros) W:231463 reslen:436 231ms
|
2014-03-24T16:43:11.793+0000 [conn4] opening db: local
|
2014-03-24T16:43:11.793+0000 [conn4] Assertion: 15927:can't open database in a read lock. if db was just closed, consider retrying the query. might otherwise indicate an internal error
|
2014-03-24T16:43:12.690+0000 [conn4] mongod.exe ...\src\mongo\util\stacktrace.cpp(169) mongo::printStackTrace+0x43
|
2014-03-24T16:43:12.690+0000 [conn4] mongod.exe ...\src\mongo\util\log.cpp(158) mongo::logContext+0x9c
|
2014-03-24T16:43:12.690+0000 [conn4] mongod.exe ...\src\mongo\util\assert_util.cpp(183) mongo::msgasserted+0xfb
|
2014-03-24T16:43:12.690+0000 [conn4] mongod.exe ...\src\mongo\db\catalog\database_holder.cpp(65) mongo::DatabaseHolder::getOrCreate+0x34e
|
2014-03-24T16:43:12.690+0000 [conn4] mongod.exe ...\src\mongo\db\client.cpp(357) mongo::Client::Context::_finishInit+0x85
|
2014-03-24T16:43:12.690+0000 [conn4] mongod.exe ...\src\mongo\db\db.h(70) mongo::dbtemprelease::~dbtemprelease+0x49
|
2014-03-24T16:43:12.691+0000 [conn4] mongod.exe ...\src\mongo\db\clientcursor.cpp(225) mongo::ClientCursor::staticYield+0x29b
|
2014-03-24T16:43:12.691+0000 [conn4] mongod.exe ...\src\mongo\db\query\plan_executor.cpp(83) mongo::PlanExecutor::getNext+0xf7
|
2014-03-24T16:43:12.691+0000 [conn4] mongod.exe ...\src\mongo\db\query\new_find.cpp(204) mongo::newGetMore+0x413
|
2014-03-24T16:43:12.691+0000 [conn4] mongod.exe ...\src\mongo\db\instance.cpp(730) mongo::receivedGetMore+0x67b
|
2014-03-24T16:43:12.691+0000 [conn4] mongod.exe ...\src\mongo\db\instance.cpp(436) mongo::assembleResponse+0x32c
|
2014-03-24T16:43:12.691+0000 [conn4] mongod.exe ...\src\mongo\db\db.cpp(204) mongo::MyMessageHandler::process+0x111
|
2014-03-24T16:43:12.691+0000 [conn4] mongod.exe ...\src\mongo\util\net\message_server_port.cpp(210) mongo::PortMessageServer::handleIncomingMsg+0x677
|
2014-03-24T16:43:12.691+0000 [conn4] mongod.exe ...\src\third_party\boost\libs\thread\src\win32\thread.cpp(185) boost::`anonymous namespace'::thread_start_function+0x21
|
2014-03-24T16:43:12.691+0000 [conn4] mongod.exe f:\dd\vctools\crt_bld\self_64_amd64\crt\src\threadex.c(314) _callthreadstartex+0x17
|
2014-03-24T16:43:12.691+0000 [conn4] mongod.exe f:\dd\vctools\crt_bld\self_64_amd64\crt\src\threadex.c(292) _threadstartex+0x7f
|
2014-03-24T16:43:12.691+0000 [conn4] kernel32.dll BaseThreadInitThunk+0xd
|
2014-03-24T16:43:12.691+0000 [conn4] local.oplog.$main
|
2014-03-24T16:43:12.692+0000 [conn987] note: not profiling because db went away - probably a close on: testing
|
2014-03-24T16:43:12.692+0000 [conn4] getmore local.oplog.$main cursorid:18746693299 ntoreturn:0 keyUpdates:0 exception: can't open database in a read lock. if db was just closed, consider retrying the query. might otherwise indicate an internal error code:15927 numYields:5 locks(micros) r:899043 nreturned:1 reslen:176 10025ms
|
2014-03-24T16:43:12.698+0000 [conn987] dropDatabase jstests_dur_droprace starting
|
2014-03-24T16:43:12.745+0000 [conn987] removeJournalFiles
|
2014-03-24T16:43:12.764+0000 [conn987] dropDatabase jstests_dur_droprace finished
|
2014-03-24T16:43:12.766+0000 [FileAllocator] allocating new datafile /data/db/sconsTests\jstests_dur_droprace.ns, filling with zeroes...
|
2014-03-24T16:43:13.880+0000 [FileAllocator] done allocating datafile /data/db/sconsTests\jstests_dur_droprace.ns, size: 16MB, took 1.114 secs
|
2014-03-24T16:43:13.881+0000 [FileAllocator] allocating new datafile /data/db/sconsTests\jstests_dur_droprace.0, filling with zeroes...
|
2014-03-24T16:43:19.602+0000 [FileAllocator] done allocating datafile /data/db/sconsTests\jstests_dur_droprace.0, size: 64MB, took 5.719 secs
|
2014-03-24T16:43:19.603+0000 [conn987] build index on: jstests_dur_droprace.foo properties: { v: 1, key: { _id: 1 }, name: "_id_", ns: "jstests_dur_droprace.foo" }
|
2014-03-24T16:43:19.603+0000 [conn987] added index to empty collection
|
2014-03-24T16:43:19.604+0000 [conn987] insert jstests_dur_droprace.foo query: { _id: ObjectId('533060a05e3a4439beec0f21') } ninserted:1 keyUpdates:0 numYields:0 locks(micros) w:6837265 6837ms
|
2014-03-24T16:43:19.604+0000 [conn987] command jstests_dur_droprace.$cmd command: insert { insert: "foo", documents: [ { _id: ObjectId('533060a05e3a4439beec0f21') } ], ordered: true } keyUpdates:0 numYields:0 locks(micros) w:55 reslen:56 6837ms
|
2014-03-24T16:43:19.611+0000 [conn987] build index on: jstests_dur_droprace.foo properties: { v: 1, key: { x: 1.0 }, name: "x_1", ns: "jstests_dur_droprace.foo" }
|
2014-03-24T16:43:19.611+0000 [conn987] building index using bulk method
|
2014-03-24T16:43:19.611+0000 [conn987] build index done. scanned 3 total records. 0 secs
|
2014-03-24T16:43:19.614+0000 [conn987] dropDatabase jstests_dur_droprace starting
|
2014-03-24T16:43:19.666+0000 [conn987] removeJournalFiles
|
2014-03-24T16:43:19.682+0000 [conn987] dropDatabase jstests_dur_droprace finished
|
2014-03-24T16:43:19.684+0000 [FileAllocator] allocating new datafile /data/db/sconsTests\jstests_dur_droprace.ns, filling with zeroes...
|
2014-03-24T16:43:12.692+0000 [replslave] op: { $err: "can't open database in a read lock. if db was just closed, consider retrying the query. might otherwise indicate an internal error", code: 15927 }
|
2014-03-24T16:43:12.692+0000 [replslave] halting replication
|
2014-03-24T16:43:12.692+0000 [replslave] caught SyncException
|
2014-03-24T16:43:12.692+0000 [replslave] repl: sleep 10 sec before next pass
|
2014-03-24T16:43:22.319+0000 [FileAllocator] done allocating datafile /data/db/sconsTests\jstests_dur_droprace.ns, size: 16MB, took 2.633 secs
|
2014-03-24T16:43:22.693+0000 [replslave] all sources dead: sync error: no ts found querying remote oplog record, sleeping for 5 seconds
|
2014-03-24T16:43:27.694+0000 [replslave] all sources dead: sync error: no ts found querying remote oplog record, sleeping for 5 seconds
|
Suite log excerpt:
[2014/03/24 16:43:02.400] *******************************************
|
[2014/03/24 16:43:02.404] Test : drop_index.js ...
|
[2014/03/24 16:43:02.404] Command : ./mongo.exe --port 27999 --authenticationMechanism MONGODB-CR --writeMode commands --eval testingReplication = true; D:\data\mci\git@github.commongodb\mongo.git\mongodb-mongo-v2.6\jstests\core\drop_index.js
|
[2014/03/24 16:43:02.404] Date : Mon Mar 24 16:43:02 2014
|
[2014/03/24 16:43:02.502] (output suppressed; see http://buildlogs.mongodb.org/build/53305ef0d2a60f71ad0009ef/test/53306096d2a60f732a0002da/)
|
[2014/03/24 16:43:02.620] 219.9998 ms
|
[2014/03/24 16:43:02.704]
|
[2014/03/24 16:43:02.704] waiting for slave (28000) to catch up to master (27999)
|
[2014/03/24 16:43:02.704] caught up!
|
[2014/03/24 16:43:02.704] replication ok for 9750 collections
|
[2014/03/24 16:43:02.704] skipping D:\data\mci\git@github.commongodb\mongo.git\mongodb-mongo-v2.6\jstests\core\dropdb.js
|
[2014/03/24 16:43:02.708] waiting for slave (28000) to catch up to master (27999)
|
[2014/03/24 16:43:02.708] caught up!
|
[2014/03/24 16:43:02.708] replication ok for 9878 collections
|
[2014/03/24 16:43:02.708] *******************************************
|
[2014/03/24 16:43:02.708] Test : dropdb_race.js ...
|
[2014/03/24 16:43:02.708] Command : ./mongo.exe --port 27999 --authenticationMechanism MONGODB-CR --writeMode commands --eval testingReplication = true; D:\data\mci\git@github.commongodb\mongo.git\mongodb-mongo-v2.6\jstests\core\dropdb_race.js
|
[2014/03/24 16:43:02.708] Date : Mon Mar 24 16:43:02 2014
|
[2014/03/24 16:43:02.817] (output suppressed; see http://buildlogs.mongodb.org/build/53305ef0d2a60f71ad0009ef/test/53306096d2a60f737900008e/)
|
[2014/03/24 16:44:05.459] 1.0459 minutes
|
[2014/03/24 16:49:10.494] timed out
|
[2014/03/24 16:49:10.494] timed out
|
[2014/03/24 16:49:10.494]
|
[2014/03/24 16:49:16.531] 154 tests succeeded
|
[2014/03/24 16:49:16.531] 591 tests didn't get run
|
[2014/03/24 16:49:16.534] replication ok for 9878 collections
|
[2014/03/24 16:49:16.624] timed out
|
[2014/03/24 16:49:16.624] Traceback (most recent call last):
|
[2014/03/24 16:49:16.624] File "buildscripts/smoke.py", line 1267, in <module>
|
[2014/03/24 16:49:16.624] main()
|
[2014/03/24 16:49:16.624] File "buildscripts/smoke.py", line 1251, in main
|
[2014/03/24 16:49:16.624] run_tests(tests)
|
[2014/03/24 16:49:16.624] File "buildscripts/smoke.py", line 696, in run_tests
|
[2014/03/24 16:49:16.624] master.wait_for_repl()
|
[2014/03/24 16:49:16.624] File "buildscripts/smoke.py", line 306, in wait_for_repl
|
[2014/03/24 16:49:16.627] Connection(port=self.port).testing.smokeWait.insert({}, w=2, wtimeout=5*60*1000)
|
[2014/03/24 16:49:16.627] File "C:\Python27\lib\site-packages\pymongo\collection.py", line 362, in insert
|
[2014/03/24 16:49:16.627] self.database.connection)
|
[2014/03/24 16:49:16.627] File "C:\Python27\lib\site-packages\pymongo\message.py", line 248, in _do_batched_insert
|
[2014/03/24 16:49:16.627] client._send_message(_insert_message(EMPTY.join(data), safe), safe)
|
[2014/03/24 16:49:16.627] File "C:\Python27\lib\site-packages\pymongo\mongo_client.py", line 969, in _send_message
|
[2014/03/24 16:49:16.627] rv = self.__check_response_to_last_error(response)
|
[2014/03/24 16:49:16.627] File "C:\Python27\lib\site-packages\pymongo\mongo_client.py", line 911, in __check_response_to_last_error
|
[2014/03/24 16:49:16.627] raise OperationFailure(details["err"], details["code"])
|
[2014/03/24 16:49:16.627] pymongo.errors.OperationFailure: timeout
|
[2014/03/24 16:49:16.656] Running script task for command
|
${killall_mci|pkill mongo; pkill -9 mongod; pkill -9 mongos; pkill -f buildlogger.py; pkill -f smoke.py}
|
in directory /data/mci/git@github.commongodb/mongo.git/mongodb-mongo-v2.6
|
[2014/03/24 16:49:16.656] Full command: taskkill /IM mongod.exe /IM mongos.exe /IM mongo.exe /IM test.exe /IM buildlogger.py /IM smoke.py /IM python.exe /IM cl.exe /f
|
[2014/03/24 16:49:16.658] Command successfully started and appended to running commands
|
Attachments
Issue Links
- is related to
-
SERVER-13351 Slaves halt replication if oplog getmore returns $err
-
- Closed
-