-
Type: Bug
-
Resolution: Done
-
Priority: Critical - P2
-
Affects Version/s: 2.6.0, 2.6.1-rc0
-
Component/s: Querying
-
ALL
ISSUE SUMMARY
Queries using the $or operator can in some cases stop the server with a fatal assertion if the collection or an index on the collection is dropped simultaneously. Specifically, the crash occurs if the $or query yields at the same time as the collection or index is dropped.
USER IMPACT
Server crashes can affect quorum of a replica set and in the worst case lead to unavailability of the database. As the assertion depends on yields on $or queries, their timing is hard to predict and the server can appear to crash unprompted. The server can be restarted and no recovery actions are required, however the index or collection drop will have to be repeated. Users not using $or queries are not affected by the issue.
WORKAROUNDS
As a workaround for dropping a collection, empty the collection with remove() before dropping it. This will significantly reduce (but not eliminate) the chance of encountering the issue. The following illustrates how to employ the workaround in order to drop the collection named coll:
> db.coll.remove({}) WriteResult({ "nRemoved" : 100 }) > db.coll.drop() true
There is no workaround for the case of dropping an index, therefore users should hold off dropping an index while they might be running $or queries.
RESOLUTION
For all child classes of the Runner class, implementations of Runner::kill() must guarantee the postcondition that subsequent calls to Runner::collection() will return NULL. SubplanRunner:kill() was not meeting this contract. This bug fix addresses this issue.
AFFECTED VERSIONS
Version 2.6.0 is affected by this issue.
PATCHES
The patch is included in the 2.6.1 production release.
Original description
A fatal assertion is triggered during cursor cache invalidation when the following occurs:
- An $or query that uses subplanning is in progress
- During a yield of the $or query, the collection is dropped or an index on the collection is dropped
As a workaround, empty the collection with remove() before dropping it. The following illustrates how to employ the workaround in order to drop the collection named coll:
> db.coll.remove({}) WriteResult({ "nRemoved" : 100 }) > db.coll.drop() true
This workaround will significantly reduce (but not eliminate) the chance of encountering the issue.
Reproduce with the following shell script:
mongo --eval 'db.foo.ensureIndex({a:1}); db.foo.ensureIndex({b:1});' ; mongo --eval 'for(var i=0;i<100;i++){ db.foo.insert({a:1,b:1}); }' ; mongo --eval 'db.foo.find({$or:[{a:1,$where:function(){sleep(100);return false;}},{b:1}]}).next()' & sleep 1 ; mongo --eval 'db.foo.drop()'
Output from the mongod log from the above:
2014-04-28T11:39:05.677-0400 [conn4] test Invariant failure runner->collection() == NULL src/mongo/db/catalog/collection_cursor_cache.cpp 305 2014-04-28T11:39:05.681-0400 [conn4] test 0x1006a248b 0x10065a122 0x100649e29 0x1000e01de 0x1000f06e2 0x1000e75b7 0x1001b7a75 0x1001b2cc5 0x1001b3a82 0x1001b478c 0x1003c85ef 0x10029a560 0x100006a84 0x100667831 0x1006d71f5 0x101cb7782 0x101ca41c1 /Users/rassi/bin/mongod(_ZN5mongo15printStackTraceERSo+0x2b) [0x1006a248b] /Users/rassi/bin/mongod(_ZN5mongo10logContextEPKc+0x72) [0x10065a122] /Users/rassi/bin/mongod(_ZN5mongo15invariantFailedEPKcS1_j+0xe9) [0x100649e29] /Users/rassi/bin/mongod(_ZN5mongo21CollectionCursorCache13invalidateAllEb+0x15c) [0x1000e01de] /Users/rassi/bin/mongod(_ZN5mongo12IndexCatalog14dropAllIndexesEb+0xa4) [0x1000f06e2] /Users/rassi/bin/mongod(_ZN5mongo8Database14dropCollectionERKNS_10StringDataE+0x2df) [0x1000e75b7] /Users/rassi/bin/mongod(_ZN5mongo7CmdDrop3runERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x421) [0x1001b7a75] /Users/rassi/bin/mongod(_ZN5mongo12_execCommandEPNS_7CommandERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x25) [0x1001b2cc5] /Users/rassi/bin/mongod(_ZN5mongo7Command11execCommandEPS0_RNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0xb86) [0x1001b3a82] /Users/rassi/bin/mongod(_ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x56c) [0x1001b478c] /Users/rassi/bin/mongod(_ZN5mongo11newRunQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1_+0x64f) [0x1003c85ef] /Users/rassi/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x7b0) [0x10029a560] /Users/rassi/bin/mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x134) [0x100006a84] /Users/rassi/bin/mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x691) [0x100667831] /Users/rassi/bin/mongod(thread_proxy+0xe5) [0x1006d71f5] /usr/lib/system/libsystem_c.dylib(_pthread_start+0x147) [0x101cb7782] /usr/lib/system/libsystem_c.dylib(thread_start+0xd) [0x101ca41c1] 2014-04-28T11:39:05.681-0400 [conn4] ***aborting after invariant() failure 2014-04-28T11:39:05.686-0400 [conn4] SEVERE: Got signal: 6 (Abort trap: 6). Backtrace:0x1006a248b 0x1006a21cf 0x101ca592a 0 0x101cfcdfa 0x100649e9b 0x1000e01de 0x1000f06e2 0x1000e75b7 0x1001b7a75 0x1001b2cc5 0x1001b3a82 0x1001b478c 0x1003c85ef 0x10029a560 0x100006a84 0x100667831 0x1006d71f5 0x101cb7782 0x101ca41c1 /Users/rassi/bin/mongod(_ZN5mongo15printStackTraceERSo+0x2b) [0x1006a248b] /Users/rassi/bin/mongod(_ZN5mongo12_GLOBAL__N_110abruptQuitEi+0xbf) [0x1006a21cf] /usr/lib/system/libsystem_c.dylib(_sigtramp+0x1a) [0x101ca592a] ??? [0] /usr/lib/system/libsystem_c.dylib(abort+0x8f) [0x101cfcdfa] /Users/rassi/bin/mongod(_ZN5mongo15invariantFailedEPKcS1_j+0x15b) [0x100649e9b] /Users/rassi/bin/mongod(_ZN5mongo21CollectionCursorCache13invalidateAllEb+0x15c) [0x1000e01de] /Users/rassi/bin/mongod(_ZN5mongo12IndexCatalog14dropAllIndexesEb+0xa4) [0x1000f06e2] /Users/rassi/bin/mongod(_ZN5mongo8Database14dropCollectionERKNS_10StringDataE+0x2df) [0x1000e75b7] /Users/rassi/bin/mongod(_ZN5mongo7CmdDrop3runERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x421) [0x1001b7a75] /Users/rassi/bin/mongod(_ZN5mongo12_execCommandEPNS_7CommandERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x25) [0x1001b2cc5] /Users/rassi/bin/mongod(_ZN5mongo7Command11execCommandEPS0_RNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0xb86) [0x1001b3a82] /Users/rassi/bin/mongod(_ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x56c) [0x1001b478c] /Users/rassi/bin/mongod(_ZN5mongo11newRunQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1_+0x64f) [0x1003c85ef] /Users/rassi/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x7b0) [0x10029a560] /Users/rassi/bin/mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x134) [0x100006a84] /Users/rassi/bin/mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x691) [0x100667831] /Users/rassi/bin/mongod(thread_proxy+0xe5) [0x1006d71f5] /usr/lib/system/libsystem_c.dylib(_pthread_start+0x147) [0x101cb7782] /usr/lib/system/libsystem_c.dylib(thread_start+0xd) [0x101ca41c1]
Originally reported in mongodb-user thread: <https://groups.google.com/forum/#!topic/mongodb-user/Z-gDnxhTGio>.
- related to
-
SERVER-13796 Increase dbtest coverage for Runner subclasses
- Closed