Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-15087

Server crashes when running concurrent mapReduce and dropDatabase commands

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: 2.6.4, 2.7.5
    • Fix Version/s: 2.6.5, 2.7.7
    • Component/s: MapReduce
    • Labels:
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Completed:

      Description

      Issue Status as of Sep 18, 2014

      ISSUE SUMMARY
      If a database is dropped while in use by a mapReduce operation, the mongod process crashes.

      USER IMPACT
      mongod process crashes and needs to be restarted.

      WORKAROUNDS
      Before dropping any database used by mapReduce operations, either interrupt the relevant mapReduce operations or wait for them to finish.

      AFFECTED VERSIONS
      MongoDB 2.6 production releases up to 2.6.4 are affected by this issue.

      FIX VERSION
      The fix is included in the 2.6.5 production release.

      RESOLUTION DETAILS
      Abort mapReduce operations with an error message if the output database has disappeared.

      Original description

      I can reproduce this crash with 2.6.4 and master (bd50249d) but not 2.4.11.

      Server output before the crash:

      2014-08-28T19:35:05.984-0400 D COMMANDS [conn3] mr ns: testdb.mr1
      2014-08-28T19:35:05.984-0400 D COMMANDS [conn3] run command testdb.$cmd { listCollections: 1, filter: { name: "mr1" } }
      2014-08-28T19:35:06.013-0400 D COMMANDS [conn2] run command testdb.$cmd { dropDatabase: 1.0 }
      2014-08-28T19:35:06.064-0400 I COMMANDS [conn2] dropDatabase testdb starting
      2014-08-28T19:35:06.064-0400 I QUERY    [conn3] command testdb.$cmd command: listCollections { listCollections: 1, filter: { name: "mr1" } } ntoreturn:1 keyUpdates:0 numYields:0 locks(micros) r:79551 reslen:102 79ms
      2014-08-28T19:35:06.064-0400 D QUERY    [conn3] V8Scope 0x104cac780 unregistered for op 0
      2014-08-28T19:35:06.064-0400 D QUERY    [conn3] V8Scope 0x104cac780 registered for op 0
      2014-08-28T19:35:06.064-0400 D COMMANDS [conn3] run command testdb.$cmd { drop: "tmp.mr.mr1_69" }
      2014-08-28T19:35:06.064-0400 I COMMANDS [conn3] CMD: drop testdb.tmp.mr.mr1_69
      2014-08-28T19:35:06.065-0400 D STORAGE  [conn2] dropDatabase testdb
      2014-08-28T19:35:06.065-0400 D JOURNAL  [conn2] journal WRITETODATAFILES 0.097ms
      2014-08-28T19:35:06.068-0400 D JOURNAL  [conn2] journal REMAPPRIVATEVIEW done startedAt: 4 n:1 2ms
      2014-08-28T19:35:06.072-0400 I JOURNAL  [conn2] removeJournalFiles
      2014-08-28T19:35:06.072-0400 D JOURNAL  [conn2] removeJournalFiles end
      2014-08-28T19:35:06.072-0400 D JOURNAL  [conn2] journal REMAPPRIVATEVIEW done startedAt: 0 n:1 0ms
      2014-08-28T19:35:06.074-0400 D STORAGE  [conn2] remove file /data/db/testdb.ns
      2014-08-28T19:35:06.074-0400 I COMMANDS [conn2] dropDatabase testdb finished
      2014-08-28T19:35:06.074-0400 I QUERY    [conn2] command testdb.$cmd command: dropDatabase { dropDatabase: 1.0 } keyUpdates:0 numYields:0 locks(micros) W:10818 reslen:57 61ms
      2014-08-28T19:35:06.075-0400 I STORAGE  [conn3] opening db: testdb
      2014-08-28T19:35:06.075-0400 I QUERY    [conn3] command testdb.$cmd command: drop { drop: "tmp.mr.mr1_69" } ntoreturn:1 keyUpdates:0 numYields:0 locks(micros) w:213 reslen:62 10ms
      2014-08-28T19:35:06.075-0400 D COMMANDS [conn3] run command testdb.$cmd { drop: "tmp.mr.mr1_69_inc" }
      2014-08-28T19:35:06.075-0400 I COMMANDS [conn3] CMD: drop testdb.tmp.mr.mr1_69_inc
      2014-08-28T19:35:06.075-0400 I QUERY    [conn3] command testdb.$cmd command: drop { drop: "tmp.mr.mr1_69_inc" } ntoreturn:1 keyUpdates:0 numYields:0 locks(micros) w:9 reslen:62 0ms
      2014-08-28T19:35:06.075-0400 I STORAGE  [FileAllocator] allocating new datafile /data/db/testdb.ns, filling with zeroes...
      2014-08-28T19:35:06.092-0400 I STORAGE  [FileAllocator] done allocating datafile /data/db/testdb.ns, size: 16MB,  took 0.016 secs
      2014-08-28T19:35:06.143-0400 I STORAGE  [FileAllocator] allocating new datafile /data/db/testdb.0, filling with zeroes...
      2014-08-28T19:35:06.258-0400 D COMMANDS [conn2] run command testdb.$cmd { dropDatabase: 1.0 }
      2014-08-28T19:35:06.294-0400 I STORAGE  [FileAllocator] done allocating datafile /data/db/testdb.0, size: 64MB,  took 0.151 secs
      2014-08-28T19:35:06.330-0400 D STORAGE  [conn3] MmapV1ExtentManager::allocateExtent desiredSize:8192 fromFreeList: 0 eloc: 0:2000
      2014-08-28T19:35:06.330-0400 D STORAGE  [conn3] MmapV1ExtentManager::allocateExtent desiredSize:8192 fromFreeList: 0 eloc: 0:4000
      2014-08-28T19:35:06.337-0400 D STORAGE  [conn3] MmapV1ExtentManager::allocateExtent desiredSize:8192 fromFreeList: 0 eloc: 0:6000
      2014-08-28T19:35:06.345-0400 D STORAGE  [conn3] allocating new extent
      2014-08-28T19:35:06.345-0400 D STORAGE  [conn3] MmapV1ExtentManager::allocateExtent desiredSize:131072 fromFreeList: 0 eloc: 0:8000
      2014-08-28T19:35:06.346-0400 D STORAGE  [conn3] testdb.tmp.mr.mr1_69_inc: clearing plan cache - collection info cache reset
      2014-08-28T19:35:06.346-0400 I COMMANDS [conn2] dropDatabase testdb starting
      2014-08-28T19:35:06.411-0400 D STORAGE  [conn2] dropDatabase testdb
      2014-08-28T19:35:06.412-0400 D JOURNAL  [conn2] lsn set 98963
      2014-08-28T19:35:06.412-0400 D JOURNAL  [conn2] journal WRITETODATAFILES 0.116ms
      2014-08-28T19:35:06.412-0400 D JOURNAL  [conn2] journal REMAPPRIVATEVIEW done startedAt: 1 n:1 0ms
      2014-08-28T19:35:06.418-0400 I JOURNAL  [conn2] removeJournalFiles
      2014-08-28T19:35:06.418-0400 D JOURNAL  [conn2] removeJournalFiles end
      2014-08-28T19:35:06.418-0400 D JOURNAL  [conn2] journal REMAPPRIVATEVIEW done startedAt: 2 n:1 0ms
      2014-08-28T19:35:06.420-0400 D STORAGE  [conn2] remove file /data/db/testdb.ns
      2014-08-28T19:35:06.420-0400 I COMMANDS [conn2] dropDatabase testdb finished
      2014-08-28T19:35:06.420-0400 I STORAGE  [conn3] opening db: testdb
      2014-08-28T19:35:06.420-0400 I QUERY    [conn2] command testdb.$cmd command: dropDatabase { dropDatabase: 1.0 } keyUpdates:0 numYields:0 locks(micros) W:74035 reslen:57 161ms
      2014-08-28T19:35:06.420-0400 I STORAGE  [FileAllocator] allocating new datafile /data/db/testdb.ns, filling with zeroes...
      2014-08-28T19:35:06.437-0400 I STORAGE  [FileAllocator] done allocating datafile /data/db/testdb.ns, size: 16MB,  took 0.016 secs
      2014-08-28T19:35:06.484-0400 I STORAGE  [FileAllocator] allocating new datafile /data/db/testdb.0, filling with zeroes...
      2014-08-28T19:35:06.637-0400 I STORAGE  [FileAllocator] done allocating datafile /data/db/testdb.0, size: 64MB,  took 0.152 secs
      2014-08-28T19:35:06.679-0400 D STORAGE  [conn3] MmapV1ExtentManager::allocateExtent desiredSize:8192 fromFreeList: 0 eloc: 0:2000
      2014-08-28T19:35:06.679-0400 D STORAGE  [conn3] MmapV1ExtentManager::allocateExtent desiredSize:8192 fromFreeList: 0 eloc: 0:4000
      2014-08-28T19:35:06.687-0400 D STORAGE  [conn3] MmapV1ExtentManager::allocateExtent desiredSize:8192 fromFreeList: 0 eloc: 0:6000
      2014-08-28T19:35:06.692-0400 D STORAGE  [conn3] allocating new extent
      2014-08-28T19:35:06.692-0400 D STORAGE  [conn3] MmapV1ExtentManager::allocateExtent desiredSize:131072 fromFreeList: 0 eloc: 0:8000
      2014-08-28T19:35:06.692-0400 D STORAGE  [conn3] testdb.tmp.mr.mr1_69: clearing plan cache - collection info cache reset
      2014-08-28T19:35:06.692-0400 D QUERY    [conn3] Collection testdb.mr1 does not exist. Using EOF plan: query: {} sort: {} projection: {} skip: 0 limit: 0
      2014-08-28T19:35:06.694-0400 F -        [conn3] Invalid access at address: 0xd8
      2014-08-28T19:35:06.694-0400 D JOURNAL  [journal] lsn set 99293
      2014-08-28T19:35:06.694-0400 D JOURNAL  [journal] journal WRITETODATAFILES 0.132ms
      2014-08-28T19:35:06.696-0400 F -        [conn3] Got signal: 11 (Segmentation fault: 11).

      Valgrind output:

      ==31589== Invalid read of size 8
      ==31589==    at 0x11DBEA8: std::vector<mongo::IndexCatalogEntry*, std::allocator<mongo::IndexCatalogEntry*> >::begin() const (stl_vector.h:548)
      ==31589==    by 0x11DB547: mongo::IndexCatalogEntryContainer::begin() const (index_catalog_entry.h:125)
      ==31589==    by 0x11D9C27: mongo::IndexCatalog::IndexIterator::IndexIterator(mongo::OperationContext*, mongo::IndexCatalog const*, bool) (index_catalog.cpp:843)
      ==31589==    by 0x11BF578: mongo::IndexCatalog::getIndexIterator(mongo::OperationContext*, bool) const (index_catalog.h:154)
      ==31589==    by 0x1227A26: mongo::mr::State::finalReduce(mongo::CurOp*, mongo::ProgressMeterHolder&) (mr.cpp:966)
      ==31589==    by 0x122B550: mongo::mr::MapReduceCommand::run(mongo::OperationContext*, std::string const&, mongo::BSONObj&, int, std::string&, mongo::BSONObjBuilder&, bool) (mr.cpp:1433)
      ==31589==    by 0x1283F64: mongo::_execCommand(mongo::OperationContext*, mongo::Command*, std::string const&, mongo::BSONObj&, int, std::string&, mongo::BSONObjBuilder&, bool) (dbcommands.cpp:1156)
      ==31589==    by 0x1284EA6: mongo::Command::execCommand(mongo::OperationContext*, mongo::Command*, mongo::Client&, int, char const*, mongo::BSONObj&, mongo::BSONObjBuilder&, bool) (dbcommands.cpp:1369)
      ==31589==    by 0x1285780: mongo::_runCommands(mongo::OperationContext*, char const*, mongo::BSONObj&, mongo::_BufBuilder<mongo::TrivialAllocator>&, mongo::BSONObjBuilder&, bool, int) (dbcommands.cpp:1443)
      ==31589==    by 0x149DD6B: mongo::runCommands(mongo::OperationContext*, char const*, mongo::BSONObj&, mongo::CurOp&, mongo::_BufBuilder<mongo::TrivialAllocator>&, mongo::BSONObjBuilder&, bool, int) (new_find.cpp:128)
      ==31589==    by 0x149F663: mongo::newRunQuery(mongo::OperationContext*, mongo::Message&, mongo::QueryMessage&, mongo::CurOp&, mongo::Message&) (new_find.cpp:469)
      ==31589==    by 0x1385C3F: mongo::receivedQuery(mongo::OperationContext*, mongo::Client&, mongo::DbResponse&, mongo::Message&) (instance.cpp:262)
      ==31589==    by 0x1386D43: mongo::assembleResponse(mongo::OperationContext*, mongo::Message&, mongo::DbResponse&, mongo::HostAndPort const&) (instance.cpp:436)
      ==31589==    by 0x10AECB9: mongo::MyMessageHandler::process(mongo::Message&, mongo::AbstractMessagingPort*, mongo::LastError*) (db.cpp:196)
      ==31589==    by 0x17F7DC6: mongo::PortMessageServer::handleIncomingMsg(void*) (message_server_port.cpp:227)
      ==31589==    by 0x4E3F181: start_thread (pthread_create.c:312)
      ==31589==    by 0x5D7BFBC: clone (clone.S:111)
      ==31589==  Address 0xd8 is not stack'd, malloc'd or (recently) free'd
      ==31589== 
      2014-08-28T17:48:42.128-0400 F -        [conn148] Invalid access at address: 0xd8
      2014-08-28T17:48:42.145-0400 I NETWORK  [conn151] end connection 127.0.0.1:44054 (1 connection now open)
      2014-08-28T17:48:42.265-0400 F -        [conn148] Got signal: 11 (Segmentation fault).

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                8 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: