[SERVER-13766] Dropping index or collection while $or query is yielding triggers fatal assertion Created: 28/Apr/14  Updated: 11/Jul/16  Resolved: 30/Apr/14

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: 2.6.0, 2.6.1-rc0
Fix Version/s: 2.6.1, 2.7.0

Type: Bug Priority: Critical - P2
Reporter: J Rassi Assignee: J Rassi
Resolution: Done Votes: 0
Labels: cap-ticket-needed
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
Related
related to SERVER-13796 Increase dbtest coverage for Runner s... Closed
Tested
Operating System: ALL
Backport Completed:
Participants:

 Description   
Issue Status as of April 30, 2014

ISSUE SUMMARY
Queries using the $or operator can in some cases stop the server with a fatal assertion if the collection or an index on the collection is dropped simultaneously. Specifically, the crash occurs if the $or query yields at the same time as the collection or index is dropped.

USER IMPACT
Server crashes can affect quorum of a replica set and in the worst case lead to unavailability of the database. As the assertion depends on yields on $or queries, their timing is hard to predict and the server can appear to crash unprompted. The server can be restarted and no recovery actions are required, however the index or collection drop will have to be repeated. Users not using $or queries are not affected by the issue.

WORKAROUNDS
As a workaround for dropping a collection, empty the collection with remove() before dropping it. This will significantly reduce (but not eliminate) the chance of encountering the issue. The following illustrates how to employ the workaround in order to drop the collection named coll:

> db.coll.remove({})
WriteResult({ "nRemoved" : 100 })
> db.coll.drop()
true

There is no workaround for the case of dropping an index, therefore users should hold off dropping an index while they might be running $or queries.

RESOLUTION
For all child classes of the Runner class, implementations of Runner::kill() must guarantee the postcondition that subsequent calls to Runner::collection() will return NULL. SubplanRunner:kill() was not meeting this contract. This bug fix addresses this issue.

AFFECTED VERSIONS
Version 2.6.0 is affected by this issue.

PATCHES
The patch is included in the 2.6.1 production release.

Original description

A fatal assertion is triggered during cursor cache invalidation when the following occurs:

  • An $or query that uses subplanning is in progress
  • During a yield of the $or query, the collection is dropped or an index on the collection is dropped

As a workaround, empty the collection with remove() before dropping it. The following illustrates how to employ the workaround in order to drop the collection named coll:

> db.coll.remove({})
WriteResult({ "nRemoved" : 100 })
> db.coll.drop()
true

This workaround will significantly reduce (but not eliminate) the chance of encountering the issue.

Reproduce with the following shell script:

mongo --eval 'db.foo.ensureIndex({a:1}); db.foo.ensureIndex({b:1});' ;
mongo --eval 'for(var i=0;i<100;i++){ db.foo.insert({a:1,b:1}); }' ;
mongo --eval 'db.foo.find({$or:[{a:1,$where:function(){sleep(100);return false;}},{b:1}]}).next()' &
sleep 1 ;
mongo --eval 'db.foo.drop()'

Output from the mongod log from the above:

2014-04-28T11:39:05.677-0400 [conn4] test Invariant failure runner->collection() == NULL src/mongo/db/catalog/collection_cursor_cache.cpp 305
2014-04-28T11:39:05.681-0400 [conn4] test 0x1006a248b 0x10065a122 0x100649e29 0x1000e01de 0x1000f06e2 0x1000e75b7 0x1001b7a75 0x1001b2cc5 0x1001b3a82 0x1001b478c 0x1003c85ef 0x10029a560 0x100006a84 0x100667831 0x1006d71f5 0x101cb7782 0x101ca41c1
 /Users/rassi/bin/mongod(_ZN5mongo15printStackTraceERSo+0x2b) [0x1006a248b]
 /Users/rassi/bin/mongod(_ZN5mongo10logContextEPKc+0x72) [0x10065a122]
 /Users/rassi/bin/mongod(_ZN5mongo15invariantFailedEPKcS1_j+0xe9) [0x100649e29]
 /Users/rassi/bin/mongod(_ZN5mongo21CollectionCursorCache13invalidateAllEb+0x15c) [0x1000e01de]
 /Users/rassi/bin/mongod(_ZN5mongo12IndexCatalog14dropAllIndexesEb+0xa4) [0x1000f06e2]
 /Users/rassi/bin/mongod(_ZN5mongo8Database14dropCollectionERKNS_10StringDataE+0x2df) [0x1000e75b7]
 /Users/rassi/bin/mongod(_ZN5mongo7CmdDrop3runERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x421) [0x1001b7a75]
 /Users/rassi/bin/mongod(_ZN5mongo12_execCommandEPNS_7CommandERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x25) [0x1001b2cc5]
 /Users/rassi/bin/mongod(_ZN5mongo7Command11execCommandEPS0_RNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0xb86) [0x1001b3a82]
 /Users/rassi/bin/mongod(_ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x56c) [0x1001b478c]
 /Users/rassi/bin/mongod(_ZN5mongo11newRunQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1_+0x64f) [0x1003c85ef]
 /Users/rassi/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x7b0) [0x10029a560]
 /Users/rassi/bin/mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x134) [0x100006a84]
 /Users/rassi/bin/mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x691) [0x100667831]
 /Users/rassi/bin/mongod(thread_proxy+0xe5) [0x1006d71f5]
 /usr/lib/system/libsystem_c.dylib(_pthread_start+0x147) [0x101cb7782]
 /usr/lib/system/libsystem_c.dylib(thread_start+0xd) [0x101ca41c1]
2014-04-28T11:39:05.681-0400 [conn4]
 
***aborting after invariant() failure
 
 
2014-04-28T11:39:05.686-0400 [conn4] SEVERE: Got signal: 6 (Abort trap: 6).
Backtrace:0x1006a248b 0x1006a21cf 0x101ca592a 0 0x101cfcdfa 0x100649e9b 0x1000e01de 0x1000f06e2 0x1000e75b7 0x1001b7a75 0x1001b2cc5 0x1001b3a82 0x1001b478c 0x1003c85ef 0x10029a560 0x100006a84 0x100667831 0x1006d71f5 0x101cb7782 0x101ca41c1
 /Users/rassi/bin/mongod(_ZN5mongo15printStackTraceERSo+0x2b) [0x1006a248b]
 /Users/rassi/bin/mongod(_ZN5mongo12_GLOBAL__N_110abruptQuitEi+0xbf) [0x1006a21cf]
 /usr/lib/system/libsystem_c.dylib(_sigtramp+0x1a) [0x101ca592a]
 ??? [0]
 /usr/lib/system/libsystem_c.dylib(abort+0x8f) [0x101cfcdfa]
 /Users/rassi/bin/mongod(_ZN5mongo15invariantFailedEPKcS1_j+0x15b) [0x100649e9b]
 /Users/rassi/bin/mongod(_ZN5mongo21CollectionCursorCache13invalidateAllEb+0x15c) [0x1000e01de]
 /Users/rassi/bin/mongod(_ZN5mongo12IndexCatalog14dropAllIndexesEb+0xa4) [0x1000f06e2]
 /Users/rassi/bin/mongod(_ZN5mongo8Database14dropCollectionERKNS_10StringDataE+0x2df) [0x1000e75b7]
 /Users/rassi/bin/mongod(_ZN5mongo7CmdDrop3runERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x421) [0x1001b7a75]
 /Users/rassi/bin/mongod(_ZN5mongo12_execCommandEPNS_7CommandERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x25) [0x1001b2cc5]
 /Users/rassi/bin/mongod(_ZN5mongo7Command11execCommandEPS0_RNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0xb86) [0x1001b3a82]
 /Users/rassi/bin/mongod(_ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x56c) [0x1001b478c]
 /Users/rassi/bin/mongod(_ZN5mongo11newRunQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1_+0x64f) [0x1003c85ef]
 /Users/rassi/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x7b0) [0x10029a560]
 /Users/rassi/bin/mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x134) [0x100006a84]
 /Users/rassi/bin/mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x691) [0x100667831]
 /Users/rassi/bin/mongod(thread_proxy+0xe5) [0x1006d71f5]
 /usr/lib/system/libsystem_c.dylib(_pthread_start+0x147) [0x101cb7782]
 /usr/lib/system/libsystem_c.dylib(thread_start+0xd) [0x101ca41c1]

Originally reported in mongodb-user thread: <https://groups.google.com/forum/#!topic/mongodb-user/Z-gDnxhTGio>.



 Comments   
Comment by Githook User [ 30/Apr/14 ]

Author:

{u'username': u'jrassi', u'name': u'Jason Rassi', u'email': u'rassi@10gen.com'}

Message: SERVER-13766 SubplanRunner::kill() needs to clear _collection

For all child classes of Runner, implementations of Runner::kill()
must guarantee the postcondition that subsequent calls to
Runner::collection() will return NULL. SubplanRunner:kill() was not
meeting this contract.

(cherry picked from commit 1d98478d9d529e886143415cbb5b507362ab45eb)
Branch: v2.6
https://github.com/mongodb/mongo/commit/83460112e794277ef4aee3612773bf237144ee84

Comment by Githook User [ 28/Apr/14 ]

Author:

{u'username': u'jrassi', u'name': u'Jason Rassi', u'email': u'rassi@10gen.com'}

Message: SERVER-13766 SubplanRunner::kill() needs to clear _collection

For all child classes of Runner, implementations of Runner::kill()
must guarantee the postcondition that subsequent calls to
Runner::collection() will return NULL. SubplanRunner:kill() was not
meeting this contract.
Branch: master
https://github.com/mongodb/mongo/commit/1d98478d9d529e886143415cbb5b507362ab45eb

Generated at Thu Feb 08 03:32:50 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.