Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-13435

dropdb_race.js in small_oplog (causes replication to halt on slave)

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.6.0-rc3
    • Component/s: None
    • Labels:
      None
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL

      Description

      444bb62c24 / Windows 64-bit 2008R2+ / jsCore_small_oplog
      https://mci.10gen.com/ui/task_log_raw/mongodb_mongo_v2.6_windows_64_2k8_444bb62c24786a65334f4be214d1154dde00c71b_14_03_24_14_29_20_jsCore_small_oplog_windows_64_test?type=T
      http://buildlogs.mongodb.org/mci_0.9_windows_64_2k8/builds/42921/test/jsCore_small_oplog_0/dropdb_race.js

      In the small_oplog suite, this test issues the closeAllDatabases command on the master while the slave's getmore into oplog.rs is yielding. When the getmore returns from the yield, the dbtemprelease() destructor acquires a Context on the local database, which attempts to re-open the local database (which had been closed during the yield); this trips an massert, since opening a database requires a write lock. The slave gets back an error from the getmore call, and halts replication on the slave entirely (the "autoresync" server option is not set, so replication is never re-attempted). This prevents any further tests in the suite from running, as small_oplog waits for the slave to catch up to the master inbetween tests.

      Assuming the above analysis is correct, possible remedies include any of:

      • make closeAllDatabases not close "local"
      • make dbtemprelease detect this failure
      • turn on autoresync
      • remove this test from the small_oplog suite

      Test log excerpt:

      2014-03-24T16:43:11.561+0000 [conn987] build index on: jstests_dur_droprace.foo properties: { v: 1, key: { x: 1.0 }, name: "x_1", ns: "jstests_dur_droprace.foo" }
      2014-03-24T16:43:11.561+0000 [conn987] 	 building index using bulk method
      2014-03-24T16:43:11.561+0000 [conn987] build index done.  scanned 3 total records. 0 secs
      2014-03-24T16:43:11.562+0000 [conn987] DatabaseHolder::closeAll path:/data/db/sconsTests
      2014-03-24T16:43:11.793+0000 [conn987] command admin.$cmd command: closeAllDatabases { closeAllDatabases: 1.0 } keyUpdates:0 numYields:0 locks(micros) W:231463 reslen:436 231ms
      2014-03-24T16:43:11.793+0000 [conn4] opening db:  local
      2014-03-24T16:43:11.793+0000 [conn4] Assertion: 15927:can't open database in a read lock. if db was just closed, consider retrying the query. might otherwise indicate an internal error
      2014-03-24T16:43:12.690+0000 [conn4] mongod.exe    ...\src\mongo\util\stacktrace.cpp(169)                           mongo::printStackTrace+0x43
      2014-03-24T16:43:12.690+0000 [conn4] mongod.exe    ...\src\mongo\util\log.cpp(158)                                  mongo::logContext+0x9c
      2014-03-24T16:43:12.690+0000 [conn4] mongod.exe    ...\src\mongo\util\assert_util.cpp(183)                          mongo::msgasserted+0xfb
      2014-03-24T16:43:12.690+0000 [conn4] mongod.exe    ...\src\mongo\db\catalog\database_holder.cpp(65)                 mongo::DatabaseHolder::getOrCreate+0x34e
      2014-03-24T16:43:12.690+0000 [conn4] mongod.exe    ...\src\mongo\db\client.cpp(357)                                 mongo::Client::Context::_finishInit+0x85
      2014-03-24T16:43:12.690+0000 [conn4] mongod.exe    ...\src\mongo\db\db.h(70)                                        mongo::dbtemprelease::~dbtemprelease+0x49
      2014-03-24T16:43:12.691+0000 [conn4] mongod.exe    ...\src\mongo\db\clientcursor.cpp(225)                           mongo::ClientCursor::staticYield+0x29b
      2014-03-24T16:43:12.691+0000 [conn4] mongod.exe    ...\src\mongo\db\query\plan_executor.cpp(83)                     mongo::PlanExecutor::getNext+0xf7
      2014-03-24T16:43:12.691+0000 [conn4] mongod.exe    ...\src\mongo\db\query\new_find.cpp(204)                         mongo::newGetMore+0x413
      2014-03-24T16:43:12.691+0000 [conn4] mongod.exe    ...\src\mongo\db\instance.cpp(730)                               mongo::receivedGetMore+0x67b
      2014-03-24T16:43:12.691+0000 [conn4] mongod.exe    ...\src\mongo\db\instance.cpp(436)                               mongo::assembleResponse+0x32c
      2014-03-24T16:43:12.691+0000 [conn4] mongod.exe    ...\src\mongo\db\db.cpp(204)                                     mongo::MyMessageHandler::process+0x111
      2014-03-24T16:43:12.691+0000 [conn4] mongod.exe    ...\src\mongo\util\net\message_server_port.cpp(210)              mongo::PortMessageServer::handleIncomingMsg+0x677
      2014-03-24T16:43:12.691+0000 [conn4] mongod.exe    ...\src\third_party\boost\libs\thread\src\win32\thread.cpp(185)  boost::`anonymous namespace'::thread_start_function+0x21
      2014-03-24T16:43:12.691+0000 [conn4] mongod.exe    f:\dd\vctools\crt_bld\self_64_amd64\crt\src\threadex.c(314)      _callthreadstartex+0x17
      2014-03-24T16:43:12.691+0000 [conn4] mongod.exe    f:\dd\vctools\crt_bld\self_64_amd64\crt\src\threadex.c(292)      _threadstartex+0x7f
      2014-03-24T16:43:12.691+0000 [conn4] kernel32.dll                                                                   BaseThreadInitThunk+0xd
      2014-03-24T16:43:12.691+0000 [conn4] local.oplog.$main 
      2014-03-24T16:43:12.692+0000 [conn987] note: not profiling because db went away - probably a close on: testing
      2014-03-24T16:43:12.692+0000 [conn4] getmore local.oplog.$main cursorid:18746693299 ntoreturn:0 keyUpdates:0 exception: can't open database in a read lock. if db was just closed, consider retrying the query. might otherwise indicate an internal error code:15927 numYields:5 locks(micros) r:899043 nreturned:1 reslen:176 10025ms
      2014-03-24T16:43:12.698+0000 [conn987] dropDatabase jstests_dur_droprace starting
      2014-03-24T16:43:12.745+0000 [conn987] removeJournalFiles
      2014-03-24T16:43:12.764+0000 [conn987] dropDatabase jstests_dur_droprace finished
      2014-03-24T16:43:12.766+0000 [FileAllocator] allocating new datafile /data/db/sconsTests\jstests_dur_droprace.ns, filling with zeroes...
      2014-03-24T16:43:13.880+0000 [FileAllocator] done allocating datafile /data/db/sconsTests\jstests_dur_droprace.ns, size: 16MB,  took 1.114 secs
      2014-03-24T16:43:13.881+0000 [FileAllocator] allocating new datafile /data/db/sconsTests\jstests_dur_droprace.0, filling with zeroes...
      2014-03-24T16:43:19.602+0000 [FileAllocator] done allocating datafile /data/db/sconsTests\jstests_dur_droprace.0, size: 64MB,  took 5.719 secs
      2014-03-24T16:43:19.603+0000 [conn987] build index on: jstests_dur_droprace.foo properties: { v: 1, key: { _id: 1 }, name: "_id_", ns: "jstests_dur_droprace.foo" }
      2014-03-24T16:43:19.603+0000 [conn987] 	 added index to empty collection
      2014-03-24T16:43:19.604+0000 [conn987] insert jstests_dur_droprace.foo query: { _id: ObjectId('533060a05e3a4439beec0f21') } ninserted:1 keyUpdates:0 numYields:0 locks(micros) w:6837265 6837ms
      2014-03-24T16:43:19.604+0000 [conn987] command jstests_dur_droprace.$cmd command: insert { insert: "foo", documents: [ { _id: ObjectId('533060a05e3a4439beec0f21') } ], ordered: true } keyUpdates:0 numYields:0 locks(micros) w:55 reslen:56 6837ms
      2014-03-24T16:43:19.611+0000 [conn987] build index on: jstests_dur_droprace.foo properties: { v: 1, key: { x: 1.0 }, name: "x_1", ns: "jstests_dur_droprace.foo" }
      2014-03-24T16:43:19.611+0000 [conn987] 	 building index using bulk method
      2014-03-24T16:43:19.611+0000 [conn987] build index done.  scanned 3 total records. 0 secs
      2014-03-24T16:43:19.614+0000 [conn987] dropDatabase jstests_dur_droprace starting
      2014-03-24T16:43:19.666+0000 [conn987] removeJournalFiles
      2014-03-24T16:43:19.682+0000 [conn987] dropDatabase jstests_dur_droprace finished
      2014-03-24T16:43:19.684+0000 [FileAllocator] allocating new datafile /data/db/sconsTests\jstests_dur_droprace.ns, filling with zeroes...
      2014-03-24T16:43:12.692+0000 [replslave] op: { $err: "can't open database in a read lock. if db was just closed, consider retrying the query. might otherwise indicate an internal error", code: 15927 }
      2014-03-24T16:43:12.692+0000 [replslave] halting replication
      2014-03-24T16:43:12.692+0000 [replslave] caught SyncException
      2014-03-24T16:43:12.692+0000 [replslave] repl: sleep 10 sec before next pass
      2014-03-24T16:43:22.319+0000 [FileAllocator] done allocating datafile /data/db/sconsTests\jstests_dur_droprace.ns, size: 16MB,  took 2.633 secs
      2014-03-24T16:43:22.693+0000 [replslave] all sources dead: sync error: no ts found querying remote oplog record, sleeping for 5 seconds
      2014-03-24T16:43:27.694+0000 [replslave] all sources dead: sync error: no ts found querying remote oplog record, sleeping for 5 seconds

      Suite log excerpt:

       [2014/03/24 16:43:02.400]  *******************************************
       [2014/03/24 16:43:02.404]          Test : drop_index.js ...
       [2014/03/24 16:43:02.404]       Command : ./mongo.exe --port 27999 --authenticationMechanism MONGODB-CR --writeMode commands --eval testingReplication = true; D:\data\mci\git@github.commongodb\mongo.git\mongodb-mongo-v2.6\jstests\core\drop_index.js
       [2014/03/24 16:43:02.404]          Date : Mon Mar 24 16:43:02 2014
       [2014/03/24 16:43:02.502]                 (output suppressed; see http://buildlogs.mongodb.org/build/53305ef0d2a60f71ad0009ef/test/53306096d2a60f732a0002da/)
       [2014/03/24 16:43:02.620]                   219.9998 ms
       [2014/03/24 16:43:02.704]
       [2014/03/24 16:43:02.704] waiting for slave (28000) to catch up to master (27999)
       [2014/03/24 16:43:02.704] caught up!
       [2014/03/24 16:43:02.704] replication ok for 9750 collections
       [2014/03/24 16:43:02.704] skipping D:\data\mci\git@github.commongodb\mongo.git\mongodb-mongo-v2.6\jstests\core\dropdb.js
       [2014/03/24 16:43:02.708] waiting for slave (28000) to catch up to master (27999)
       [2014/03/24 16:43:02.708] caught up!
       [2014/03/24 16:43:02.708] replication ok for 9878 collections
       [2014/03/24 16:43:02.708]  *******************************************
       [2014/03/24 16:43:02.708]          Test : dropdb_race.js ...
       [2014/03/24 16:43:02.708]       Command : ./mongo.exe --port 27999 --authenticationMechanism MONGODB-CR --writeMode commands --eval testingReplication = true; D:\data\mci\git@github.commongodb\mongo.git\mongodb-mongo-v2.6\jstests\core\dropdb_race.js
       [2014/03/24 16:43:02.708]          Date : Mon Mar 24 16:43:02 2014
       [2014/03/24 16:43:02.817]                 (output suppressed; see http://buildlogs.mongodb.org/build/53305ef0d2a60f71ad0009ef/test/53306096d2a60f737900008e/)
       [2014/03/24 16:44:05.459]                     1.0459 minutes
       [2014/03/24 16:49:10.494] timed out
       [2014/03/24 16:49:10.494] timed out
       [2014/03/24 16:49:10.494]
       [2014/03/24 16:49:16.531] 154 tests succeeded
       [2014/03/24 16:49:16.531] 591 tests didn't get run
       [2014/03/24 16:49:16.534] replication ok for 9878 collections
       [2014/03/24 16:49:16.624] timed out
       [2014/03/24 16:49:16.624] Traceback (most recent call last):
       [2014/03/24 16:49:16.624]   File "buildscripts/smoke.py", line 1267, in <module>
       [2014/03/24 16:49:16.624]     main()
       [2014/03/24 16:49:16.624]   File "buildscripts/smoke.py", line 1251, in main
       [2014/03/24 16:49:16.624]     run_tests(tests)
       [2014/03/24 16:49:16.624]   File "buildscripts/smoke.py", line 696, in run_tests
       [2014/03/24 16:49:16.624]     master.wait_for_repl()
       [2014/03/24 16:49:16.624]   File "buildscripts/smoke.py", line 306, in wait_for_repl
       [2014/03/24 16:49:16.627]     Connection(port=self.port).testing.smokeWait.insert({}, w=2, wtimeout=5*60*1000)
       [2014/03/24 16:49:16.627]   File "C:\Python27\lib\site-packages\pymongo\collection.py", line 362, in insert
       [2014/03/24 16:49:16.627]     self.database.connection)
       [2014/03/24 16:49:16.627]   File "C:\Python27\lib\site-packages\pymongo\message.py", line 248, in _do_batched_insert
       [2014/03/24 16:49:16.627]     client._send_message(_insert_message(EMPTY.join(data), safe), safe)
       [2014/03/24 16:49:16.627]   File "C:\Python27\lib\site-packages\pymongo\mongo_client.py", line 969, in _send_message
       [2014/03/24 16:49:16.627]     rv = self.__check_response_to_last_error(response)
       [2014/03/24 16:49:16.627]   File "C:\Python27\lib\site-packages\pymongo\mongo_client.py", line 911, in __check_response_to_last_error
       [2014/03/24 16:49:16.627]     raise OperationFailure(details["err"], details["code"])
       [2014/03/24 16:49:16.627] pymongo.errors.OperationFailure: timeout
       [2014/03/24 16:49:16.656] Running script task for command 
      ${killall_mci|pkill mongo; pkill -9 mongod; pkill -9 mongos; pkill -f buildlogger.py; pkill -f smoke.py}
       in directory /data/mci/git@github.commongodb/mongo.git/mongodb-mongo-v2.6
       [2014/03/24 16:49:16.656] Full command: taskkill /IM mongod.exe /IM mongos.exe /IM mongo.exe /IM test.exe /IM buildlogger.py /IM smoke.py /IM python.exe /IM cl.exe /f
       [2014/03/24 16:49:16.658] Command successfully started and appended to running commands

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              milkie Eric Milkie
              Reporter:
              rassi J Rassi
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: