[SERVER-15087] Server crashes when running concurrent mapReduce and dropDatabase commands Created: 28/Aug/14  Updated: 11/Mar/15  Resolved: 17/Sep/14

Status: Closed
Project: Core Server
Component/s: MapReduce
Affects Version/s: 2.6.4, 2.7.5
Fix Version/s: 2.6.5, 2.7.7

Type: Bug Priority: Major - P3
Reporter: Kamran K. Assignee: Mathias Stearn
Resolution: Done Votes: 0
Labels: 28qa
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
related to SERVER-15842 Server aborts during concurrent mapre... Closed
Tested
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Completed:
Participants:

 Description   
Issue Status as of Sep 18, 2014

ISSUE SUMMARY
If a database is dropped while in use by a mapReduce operation, the mongod process crashes.

USER IMPACT
mongod process crashes and needs to be restarted.

WORKAROUNDS
Before dropping any database used by mapReduce operations, either interrupt the relevant mapReduce operations or wait for them to finish.

AFFECTED VERSIONS
MongoDB 2.6 production releases up to 2.6.4 are affected by this issue.

FIX VERSION
The fix is included in the 2.6.5 production release.

RESOLUTION DETAILS
Abort mapReduce operations with an error message if the output database has disappeared.

Original description

I can reproduce this crash with 2.6.4 and master (bd50249d) but not 2.4.11.

Server output before the crash:

2014-08-28T19:35:05.984-0400 D COMMANDS [conn3] mr ns: testdb.mr1
2014-08-28T19:35:05.984-0400 D COMMANDS [conn3] run command testdb.$cmd { listCollections: 1, filter: { name: "mr1" } }
2014-08-28T19:35:06.013-0400 D COMMANDS [conn2] run command testdb.$cmd { dropDatabase: 1.0 }
2014-08-28T19:35:06.064-0400 I COMMANDS [conn2] dropDatabase testdb starting
2014-08-28T19:35:06.064-0400 I QUERY    [conn3] command testdb.$cmd command: listCollections { listCollections: 1, filter: { name: "mr1" } } ntoreturn:1 keyUpdates:0 numYields:0 locks(micros) r:79551 reslen:102 79ms
2014-08-28T19:35:06.064-0400 D QUERY    [conn3] V8Scope 0x104cac780 unregistered for op 0
2014-08-28T19:35:06.064-0400 D QUERY    [conn3] V8Scope 0x104cac780 registered for op 0
2014-08-28T19:35:06.064-0400 D COMMANDS [conn3] run command testdb.$cmd { drop: "tmp.mr.mr1_69" }
2014-08-28T19:35:06.064-0400 I COMMANDS [conn3] CMD: drop testdb.tmp.mr.mr1_69
2014-08-28T19:35:06.065-0400 D STORAGE  [conn2] dropDatabase testdb
2014-08-28T19:35:06.065-0400 D JOURNAL  [conn2] journal WRITETODATAFILES 0.097ms
2014-08-28T19:35:06.068-0400 D JOURNAL  [conn2] journal REMAPPRIVATEVIEW done startedAt: 4 n:1 2ms
2014-08-28T19:35:06.072-0400 I JOURNAL  [conn2] removeJournalFiles
2014-08-28T19:35:06.072-0400 D JOURNAL  [conn2] removeJournalFiles end
2014-08-28T19:35:06.072-0400 D JOURNAL  [conn2] journal REMAPPRIVATEVIEW done startedAt: 0 n:1 0ms
2014-08-28T19:35:06.074-0400 D STORAGE  [conn2] remove file /data/db/testdb.ns
2014-08-28T19:35:06.074-0400 I COMMANDS [conn2] dropDatabase testdb finished
2014-08-28T19:35:06.074-0400 I QUERY    [conn2] command testdb.$cmd command: dropDatabase { dropDatabase: 1.0 } keyUpdates:0 numYields:0 locks(micros) W:10818 reslen:57 61ms
2014-08-28T19:35:06.075-0400 I STORAGE  [conn3] opening db: testdb
2014-08-28T19:35:06.075-0400 I QUERY    [conn3] command testdb.$cmd command: drop { drop: "tmp.mr.mr1_69" } ntoreturn:1 keyUpdates:0 numYields:0 locks(micros) w:213 reslen:62 10ms
2014-08-28T19:35:06.075-0400 D COMMANDS [conn3] run command testdb.$cmd { drop: "tmp.mr.mr1_69_inc" }
2014-08-28T19:35:06.075-0400 I COMMANDS [conn3] CMD: drop testdb.tmp.mr.mr1_69_inc
2014-08-28T19:35:06.075-0400 I QUERY    [conn3] command testdb.$cmd command: drop { drop: "tmp.mr.mr1_69_inc" } ntoreturn:1 keyUpdates:0 numYields:0 locks(micros) w:9 reslen:62 0ms
2014-08-28T19:35:06.075-0400 I STORAGE  [FileAllocator] allocating new datafile /data/db/testdb.ns, filling with zeroes...
2014-08-28T19:35:06.092-0400 I STORAGE  [FileAllocator] done allocating datafile /data/db/testdb.ns, size: 16MB,  took 0.016 secs
2014-08-28T19:35:06.143-0400 I STORAGE  [FileAllocator] allocating new datafile /data/db/testdb.0, filling with zeroes...
2014-08-28T19:35:06.258-0400 D COMMANDS [conn2] run command testdb.$cmd { dropDatabase: 1.0 }
2014-08-28T19:35:06.294-0400 I STORAGE  [FileAllocator] done allocating datafile /data/db/testdb.0, size: 64MB,  took 0.151 secs
2014-08-28T19:35:06.330-0400 D STORAGE  [conn3] MmapV1ExtentManager::allocateExtent desiredSize:8192 fromFreeList: 0 eloc: 0:2000
2014-08-28T19:35:06.330-0400 D STORAGE  [conn3] MmapV1ExtentManager::allocateExtent desiredSize:8192 fromFreeList: 0 eloc: 0:4000
2014-08-28T19:35:06.337-0400 D STORAGE  [conn3] MmapV1ExtentManager::allocateExtent desiredSize:8192 fromFreeList: 0 eloc: 0:6000
2014-08-28T19:35:06.345-0400 D STORAGE  [conn3] allocating new extent
2014-08-28T19:35:06.345-0400 D STORAGE  [conn3] MmapV1ExtentManager::allocateExtent desiredSize:131072 fromFreeList: 0 eloc: 0:8000
2014-08-28T19:35:06.346-0400 D STORAGE  [conn3] testdb.tmp.mr.mr1_69_inc: clearing plan cache - collection info cache reset
2014-08-28T19:35:06.346-0400 I COMMANDS [conn2] dropDatabase testdb starting
2014-08-28T19:35:06.411-0400 D STORAGE  [conn2] dropDatabase testdb
2014-08-28T19:35:06.412-0400 D JOURNAL  [conn2] lsn set 98963
2014-08-28T19:35:06.412-0400 D JOURNAL  [conn2] journal WRITETODATAFILES 0.116ms
2014-08-28T19:35:06.412-0400 D JOURNAL  [conn2] journal REMAPPRIVATEVIEW done startedAt: 1 n:1 0ms
2014-08-28T19:35:06.418-0400 I JOURNAL  [conn2] removeJournalFiles
2014-08-28T19:35:06.418-0400 D JOURNAL  [conn2] removeJournalFiles end
2014-08-28T19:35:06.418-0400 D JOURNAL  [conn2] journal REMAPPRIVATEVIEW done startedAt: 2 n:1 0ms
2014-08-28T19:35:06.420-0400 D STORAGE  [conn2] remove file /data/db/testdb.ns
2014-08-28T19:35:06.420-0400 I COMMANDS [conn2] dropDatabase testdb finished
2014-08-28T19:35:06.420-0400 I STORAGE  [conn3] opening db: testdb
2014-08-28T19:35:06.420-0400 I QUERY    [conn2] command testdb.$cmd command: dropDatabase { dropDatabase: 1.0 } keyUpdates:0 numYields:0 locks(micros) W:74035 reslen:57 161ms
2014-08-28T19:35:06.420-0400 I STORAGE  [FileAllocator] allocating new datafile /data/db/testdb.ns, filling with zeroes...
2014-08-28T19:35:06.437-0400 I STORAGE  [FileAllocator] done allocating datafile /data/db/testdb.ns, size: 16MB,  took 0.016 secs
2014-08-28T19:35:06.484-0400 I STORAGE  [FileAllocator] allocating new datafile /data/db/testdb.0, filling with zeroes...
2014-08-28T19:35:06.637-0400 I STORAGE  [FileAllocator] done allocating datafile /data/db/testdb.0, size: 64MB,  took 0.152 secs
2014-08-28T19:35:06.679-0400 D STORAGE  [conn3] MmapV1ExtentManager::allocateExtent desiredSize:8192 fromFreeList: 0 eloc: 0:2000
2014-08-28T19:35:06.679-0400 D STORAGE  [conn3] MmapV1ExtentManager::allocateExtent desiredSize:8192 fromFreeList: 0 eloc: 0:4000
2014-08-28T19:35:06.687-0400 D STORAGE  [conn3] MmapV1ExtentManager::allocateExtent desiredSize:8192 fromFreeList: 0 eloc: 0:6000
2014-08-28T19:35:06.692-0400 D STORAGE  [conn3] allocating new extent
2014-08-28T19:35:06.692-0400 D STORAGE  [conn3] MmapV1ExtentManager::allocateExtent desiredSize:131072 fromFreeList: 0 eloc: 0:8000
2014-08-28T19:35:06.692-0400 D STORAGE  [conn3] testdb.tmp.mr.mr1_69: clearing plan cache - collection info cache reset
2014-08-28T19:35:06.692-0400 D QUERY    [conn3] Collection testdb.mr1 does not exist. Using EOF plan: query: {} sort: {} projection: {} skip: 0 limit: 0
2014-08-28T19:35:06.694-0400 F -        [conn3] Invalid access at address: 0xd8
2014-08-28T19:35:06.694-0400 D JOURNAL  [journal] lsn set 99293
2014-08-28T19:35:06.694-0400 D JOURNAL  [journal] journal WRITETODATAFILES 0.132ms
2014-08-28T19:35:06.696-0400 F -        [conn3] Got signal: 11 (Segmentation fault: 11).

Valgrind output:

==31589== Invalid read of size 8
==31589==    at 0x11DBEA8: std::vector<mongo::IndexCatalogEntry*, std::allocator<mongo::IndexCatalogEntry*> >::begin() const (stl_vector.h:548)
==31589==    by 0x11DB547: mongo::IndexCatalogEntryContainer::begin() const (index_catalog_entry.h:125)
==31589==    by 0x11D9C27: mongo::IndexCatalog::IndexIterator::IndexIterator(mongo::OperationContext*, mongo::IndexCatalog const*, bool) (index_catalog.cpp:843)
==31589==    by 0x11BF578: mongo::IndexCatalog::getIndexIterator(mongo::OperationContext*, bool) const (index_catalog.h:154)
==31589==    by 0x1227A26: mongo::mr::State::finalReduce(mongo::CurOp*, mongo::ProgressMeterHolder&) (mr.cpp:966)
==31589==    by 0x122B550: mongo::mr::MapReduceCommand::run(mongo::OperationContext*, std::string const&, mongo::BSONObj&, int, std::string&, mongo::BSONObjBuilder&, bool) (mr.cpp:1433)
==31589==    by 0x1283F64: mongo::_execCommand(mongo::OperationContext*, mongo::Command*, std::string const&, mongo::BSONObj&, int, std::string&, mongo::BSONObjBuilder&, bool) (dbcommands.cpp:1156)
==31589==    by 0x1284EA6: mongo::Command::execCommand(mongo::OperationContext*, mongo::Command*, mongo::Client&, int, char const*, mongo::BSONObj&, mongo::BSONObjBuilder&, bool) (dbcommands.cpp:1369)
==31589==    by 0x1285780: mongo::_runCommands(mongo::OperationContext*, char const*, mongo::BSONObj&, mongo::_BufBuilder<mongo::TrivialAllocator>&, mongo::BSONObjBuilder&, bool, int) (dbcommands.cpp:1443)
==31589==    by 0x149DD6B: mongo::runCommands(mongo::OperationContext*, char const*, mongo::BSONObj&, mongo::CurOp&, mongo::_BufBuilder<mongo::TrivialAllocator>&, mongo::BSONObjBuilder&, bool, int) (new_find.cpp:128)
==31589==    by 0x149F663: mongo::newRunQuery(mongo::OperationContext*, mongo::Message&, mongo::QueryMessage&, mongo::CurOp&, mongo::Message&) (new_find.cpp:469)
==31589==    by 0x1385C3F: mongo::receivedQuery(mongo::OperationContext*, mongo::Client&, mongo::DbResponse&, mongo::Message&) (instance.cpp:262)
==31589==    by 0x1386D43: mongo::assembleResponse(mongo::OperationContext*, mongo::Message&, mongo::DbResponse&, mongo::HostAndPort const&) (instance.cpp:436)
==31589==    by 0x10AECB9: mongo::MyMessageHandler::process(mongo::Message&, mongo::AbstractMessagingPort*, mongo::LastError*) (db.cpp:196)
==31589==    by 0x17F7DC6: mongo::PortMessageServer::handleIncomingMsg(void*) (message_server_port.cpp:227)
==31589==    by 0x4E3F181: start_thread (pthread_create.c:312)
==31589==    by 0x5D7BFBC: clone (clone.S:111)
==31589==  Address 0xd8 is not stack'd, malloc'd or (recently) free'd
==31589== 
2014-08-28T17:48:42.128-0400 F -        [conn148] Invalid access at address: 0xd8
2014-08-28T17:48:42.145-0400 I NETWORK  [conn151] end connection 127.0.0.1:44054 (1 connection now open)
2014-08-28T17:48:42.265-0400 F -        [conn148] Got signal: 11 (Segmentation fault).



 Comments   
Comment by Githook User [ 17/Sep/14 ]

Author:

{u'username': u'RedBeard0531', u'name': u'Mathias Stearn', u'email': u'mathias@10gen.com'}

Message: SERVER-15087 Get owned copy of document in MR final reduce

(cherry picked from commit 92bf7260c3dda79abf9f6b2bdf9e7fb827f6743a)
Branch: v2.6
https://github.com/mongodb/mongo/commit/722777370bef11db921efff5b0a50242c881afd8

Comment by Githook User [ 17/Sep/14 ]

Author:

{u'username': u'RedBeard0531', u'name': u'Mathias Stearn', u'email': u'mathias@10gen.com'}

Message: SERVER-15087 check for dropped collection in MR

(cherry picked from commit aa5c0a136c9ab37023d89908c8b02d6d687ed070)
Branch: v2.6
https://github.com/mongodb/mongo/commit/299b86caf30bae2c88cd51ae189a37bd9cdeea73

Comment by Githook User [ 09/Sep/14 ]

Author:

{u'username': u'RedBeard0531', u'name': u'Mathias Stearn', u'email': u'mathias@10gen.com'}

Message: SERVER-15087 Get owned copy of document in MR final reduce
Branch: master
https://github.com/mongodb/mongo/commit/92bf7260c3dda79abf9f6b2bdf9e7fb827f6743a

Comment by Githook User [ 09/Sep/14 ]

Author:

{u'username': u'RedBeard0531', u'name': u'Mathias Stearn', u'email': u'mathias@10gen.com'}

Message: SERVER-15087 check for dropped collection in MR
Branch: master
https://github.com/mongodb/mongo/commit/aa5c0a136c9ab37023d89908c8b02d6d687ed070

Generated at Thu Feb 08 03:36:53 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.