[SERVER-14969] Dropping index during active aggregation operation can crash server Created: 20/Aug/14  Updated: 11/Jul/16  Resolved: 25/Aug/14

Status: Closed
Project: Core Server
Component/s: Aggregation Framework, MapReduce, Querying
Affects Version/s: 2.6.4
Fix Version/s: 2.6.5, 2.7.6

Type: Bug Priority: Major - P3
Reporter: Roy Assignee: J Rassi
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Ubuntu 12.04.5


Issue Links:
Depends
Duplicate
is duplicated by SERVER-15138 Mongo segmentation fault during creat... Closed
Related
is related to SERVER-15019 Killing agg executor doesn't kill und... Closed
is related to SERVER-15053 MongoDB throws segmentation fault whi... Closed
Tested
Operating System: Linux
Backport Completed:
Steps To Reproduce:

It happened during an index drop, not sure if that's the reason, or something happening concurrently (e.g. index drop & index query?)

Participants:

 Description   
Issue Status as of Aug 25, 2014

ISSUE SUMMARY
The mongod process may crash if an index being used by an aggregation operation is dropped before the operation finishes.

USER IMPACT
The mongod process may crash in this scenario and needs to be restarted.

WORKAROUNDS
Do not drop indices that may be in use by aggregation operations.

AFFECTED VERSIONS
MongoDB 2.6 production releases up to 2.6.4 are affected by this issue.

FIX VERSION
The fix is included in the 2.6.5 production release.

RESOLUTION DETAILS
When dropping an index, correctly interrupt any aggregation operations that are using it.

Original description

2014-08-20T08:02:57.175+0000 [conn47023] command Tenant_27.$cmd command: mapReduce
.....
2014-08-20T10:05:45.661+0000 [repl writer worker 15] CMD: dropIndexes Tenant_27.files
2014-08-20T10:05:45.879+0000 [repl writer worker 5] CMD: dropIndexes Tenant_27.files
2014-08-20T10:05:45.993+0000 [repl writer worker 13] CMD: dropIndexes Tenant_27.files
2014-08-20T10:05:46.059+0000 [repl writer worker 9] CMD: dropIndexes Tenant_27.files
2014-08-20T10:05:46.091+0000 [conn47023] SEVERE: Invalid access at address: 0xffffffff
2014-08-20T10:05:46.279+0000 [conn47023] SEVERE: Got signal: 11 (Segmentation fault).
Backtrace:0x11e6111 0x11e54ee 0x11e55df 0x7fd618e54cb0 0xb72dce 0xb6ed3f 0xb6fda9 0xaa698d 0xd78119 0xd7868f 0xd5927e 0xc6e7c4 0xc6eec5 0xc7e9b3 0xc80046 0xcfc7bb 0x9b3816 0xa2889a 0xa29ce2 0xa2bea6
 /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0x11e6111]
 /usr/bin/mongod() [0x11e54ee]
 /usr/bin/mongod() [0x11e55df]
 /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0) [0x7fd618e54cb0]
 /usr/bin/mongod(_ZNK5mongo18BtreeInterfaceImplINS_12BtreeData_V1EE5keyAtEPKNS_17IndexCatalogEntryENS_7DiskLocEi+0x2e) [0xb72dce]
 /usr/bin/mongod(_ZN5mongo16BtreeIndexCursor20isSavedPositionValidEv+0x3f) [0xb6ed3f]
 /usr/bin/mongod(_ZN5mongo16BtreeIndexCursor15restorePositionEv+0x39) [0xb6fda9]
 /usr/bin/mongod(_ZN5mongo9IndexScan16recoverFromYieldEv+0x5d) [0xaa698d]
 /usr/bin/mongod(_ZN5mongo12PlanExecutor12restoreStateEv+0x19) [0xd78119]
 /usr/bin/mongod(_ZN5mongo12PlanExecutor7getNextEPNS_7BSONObjEPNS_7DiskLocE+0x28f) [0xd7868f]
 /usr/bin/mongod(_ZN5mongo15MultiPlanRunner7getNextEPNS_7BSONObjEPNS_7DiskLocE+0x29e) [0xd5927e]
 /usr/bin/mongod(_ZN5mongo20DocumentSourceCursor9loadBatchEv+0x1e4) [0xc6e7c4]
 /usr/bin/mongod(_ZN5mongo20DocumentSourceCursor7getNextEv+0x115) [0xc6eec5]
 /usr/bin/mongod(_ZN5mongo19DocumentSourceGroup8populateEv+0x73) [0xc7e9b3]
 /usr/bin/mongod(_ZN5mongo19DocumentSourceGroup7getNextEv+0x426) [0xc80046]
 /usr/bin/mongod(_ZN5mongo8Pipeline3runERNS_14BSONObjBuilderE+0x8b) [0xcfc7bb]
 /usr/bin/mongod(_ZN5mongo15PipelineCommand3runERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x526) [0x9b3816]
 /usr/bin/mongod(_ZN5mongo12_execCommandEPNS_7CommandERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x3a) [0xa2889a]
 /usr/bin/mongod(_ZN5mongo7Command11execCommandEPS0_RNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0x1042) [0xa29ce2]
 /usr/bin/mongod(_ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x6c6) [0xa2bea6]



 Comments   
Comment by Githook User [ 25/Aug/14 ]

Author:

{u'username': u'jrassi', u'name': u'Jason Rassi', u'email': u'rassi@10gen.com'}

Message: SERVER-14969 CollectionCursorCache::invalidateAll() kill agg exec

(cherry picked from commit 71e2312d2ad6d418ea223d6e003a065122c926d8)
Branch: v2.6
https://github.com/mongodb/mongo/commit/08883b2312a524722d37dc28ce91c6abf801bb2e

Comment by Githook User [ 25/Aug/14 ]

Author:

{u'username': u'jrassi', u'name': u'Jason Rassi', u'email': u'rassi@10gen.com'}

Message: SERVER-14969 CollectionCursorCache::invalidateAll() kill agg exec
Branch: master
https://github.com/mongodb/mongo/commit/15a025f43ff3b038c17fce40b7d0802a58ab4e3b

Comment by J Rassi [ 25/Aug/14 ]

The "killing agg executor doesn't kill underlying executor" part of this ticket is actually separate from (and has a separate cause from) the "invalidateAll doesn't kill agg executors" part of this ticket. Broke out the former into a new ticket at SERVER-15019.

Comment by J Rassi [ 21/Aug/14 ]

The issue can be caused by running the dropIndexes command at the same time as an aggregate command. My reproducer follows. With it, I can reproduce the issue in master (2.7.6-pre) and 2.6.4, but not 2.4.11.

db.foo.drop();
var x='';
for (var i=0; i<1024*1024; i++) {
    x+='x';
}
for (var i=0; i<256; i++){
    db.foo.insert({x:x});
}
startParallelShell("for(;;) { db.foo.dropIndexes(); sleep(100); db.foo.ensureIndex({a:1}) }");
for (;;) {
    try {
        db.foo.aggregate([{$match: {x: /y/, a: null}}]);
    }
    catch(e) { }
}

The sequence of events that causes the crash is: 1) the aggregation operation acquires a read lock, gets a PipelineRunner with a document source pipeline stage (which will use an index), and releases the read lock, 2) the dropIndexes operation acquires a write lock, drops the index, and releases the write lock, and 3) the aggregation acquires a read lock (in the document source stage pipeline), the document source pipeline stage attempts to read from the underlying runner, and the crash occurs (since the index has gone away).

As it affects the 2.6 branch, the issue stems from the fact that aggregation cursors aren't properly cleaned up in CollectionCursorCache::invalidateAll(), which is called from the "drop index" machinery. See collection_cursor_cache.cpp:321. A typical aggregation operation has a "user-facing" cursor (which has isAggCursor set to true, and is registered with the collection cursor cache), and an underlying cursor that points into the collection or index being scanned for the document source pipeline stage (which has isAggCursor set to false, and is not registered with the collection cursor cache). invalidateAll() needs to invoke kill() on the PipelineRunner associated with the registered aggregation cursor, which invokes kill() on the underlying runner. However, as currently written, invalidateAll() does not call kill() on the runner for aggregation cursors; this is incorrect.

The issue is more complicated in 2.7.6-pre, however. The Runner abstraction has been removed and replaced with PlanExecutor, and the PlanExecutor stage tree is not notified of kill() operations. It is still the case that the "user-facing" cursor is registered with the collection cursor cache, but even if kill() is invoked on the associated PlanExecutor, the API doesn't allow for the kill to be propagated down to the underlying executor; kill() on a PlanExecutor merely sets the "_killed" flag. The underlying executor needs to be told about the kill, because the parent executor may be in the middle of a getNext() call when the invalidate happens (note that executors with a PipelineProxyStage root execute under no lock; the locking is performed by DocumentSourceCursor when interacting with the underlying executor).

Here's a stack trace for the issue in 2.7.6-pre:

 mongod(mongo::printStackTrace(std::basic_ostream<char, std::char_traits<char> >&)+0x27) [0x1d4d869]
 mongod(mongo::logContext(char const*)+0x71) [0x1ce4480]
 mongod(mongo::invariantFailed(char const*, char const*, unsigned int)+0xB2) [0x1ccf821]
 mongod(mongo::IndexDescriptor::_checkOk() const+0x90) [0x1728fe6]
 mongod(mongo::IndexDescriptor::isMultikey() const+0x18) [0x14e3b20]
 mongod(mongo::IndexScan::initIndexScan()+0x4A) [0x16104ce]
 mongod(mongo::IndexScan::work(unsigned long*)+0xDC) [0x1610b2e]
 mongod(mongo::FetchStage::work(unsigned long*)+0xAA) [0x15fb70a]
 mongod(mongo::KeepMutationsStage::work(unsigned long*)+0x9A) [0x1612a06]
 mongod(mongo::PlanExecutor::getNext(mongo::BSONObj*, mongo::DiskLoc*)+0x59) [0x18d072b]
 mongod(mongo::DocumentSourceCursor::loadBatch()+0x365) [0x17bfd65]
 mongod(mongo::DocumentSourceCursor::getNext()+0x49) [0x17bf92d]
 mongod(mongo::PipelineProxyStage::getNextBson()+0x4E) [0x1635fce]
 mongod(mongo::PipelineProxyStage::work(unsigned long*)+0xFD) [0x1635b67]
 mongod(mongo::PlanExecutor::getNext(mongo::BSONObj*, mongo::DiskLoc*)+0x59) [0x18d072b]
 mongod(+0x115C3FE) [0x155c3fe]
 mongod(mongo::PipelineCommand::run(mongo::OperationContext*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::BSONObj&, int, std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, mongo::BSONObjBuilder&, bool)+0x809) [0x155d9fb]
 mongod(mongo::_execCommand(mongo::OperationContext*, mongo::Command*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::BSONObj&, int, std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, mongo::BSONObjBuilder&, bool)+0x96) [0x15b2a6b]
 mongod(mongo::Command::execCommand(mongo::OperationContext*, mongo::Command*, mongo::Client&, int, char const*, mongo::BSONObj&, mongo::BSONObjBuilder&, bool)+0xBB8) [0x15b3a02]
 mongod(mongo::_runCommands(mongo::OperationContext*, char const*, mongo::BSONObj&, mongo::_BufBuilder<mongo::TrivialAllocator>&, mongo::BSONObjBuilder&, bool, int)+0x4D1) [0x15b430a]

Assigning to hari.khalsa@10gen.com for triage. cc redbeard0531.

Comment by Roy [ 20/Aug/14 ]

Hi,
Sorry for the late response.
Glad to hear. Thanks and good luck.

Comment by J Rassi [ 20/Aug/14 ]

Hi Roy,

We are able to reproduce this issue. The information I requested earlier is no longer needed. Thanks again for the report. Please continue to watch this ticket for updates on when a fix may be available, and for possible workaround information.

~ Jason Rassi

Comment by J Rassi [ 20/Aug/14 ]

Demangled stack trace with file/line info:

rassi@rassi:~/mongodb-linux-x86_64-debugsymbols-2.6.4/bin$ addr2line -f -i -e mongod 0x11e6111 0x11e54ee 0x11e55df 0x7fd618e54cb0 0xb72dce 0xb6ed3f 0xb6fda9 0xaa698d 0xd78119 0xd7868f 0xd5927e 0xc6e7c4 0xc6eec5 0xc7e9b3 0xc80046 0xcfc7bb 0x9b3816 0xa2889a 0xa29ce2 0xa2bea6 | c++filt
mongo::printStackTrace(std::basic_ostream<char, std::char_traits<char> >&)
/srv/10gen/mci-exec/mci/shell/src/src/mongo/util/stacktrace.cpp:304
abruptQuit
/srv/10gen/mci-exec/mci/shell/src/src/mongo/util/signal_handlers.cpp:107
abruptQuitWithAddrSignal
/srv/10gen/mci-exec/mci/shell/src/src/mongo/util/signal_handlers.cpp:201
??
??:0
mongo::BtreeInterfaceImpl<mongo::BtreeData_V1>::getBucket(mongo::IndexCatalogEntry const*, mongo::DiskLoc const&) const
/srv/10gen/mci-exec/mci/shell/src/src/mongo/db/index/btree_interface.cpp:50
mongo::BtreeInterfaceImpl<mongo::BtreeData_V1>::keyAt(mongo::IndexCatalogEntry const*, mongo::DiskLoc, int) const
/srv/10gen/mci-exec/mci/shell/src/src/mongo/db/index/btree_interface.cpp:162
mongo::BtreeIndexCursor::isSavedPositionValid()
/srv/10gen/mci-exec/mci/shell/src/src/mongo/db/index/btree_index_cursor.cpp:260
mongo::BtreeIndexCursor::restorePosition()
/srv/10gen/mci-exec/mci/shell/src/src/mongo/db/index/btree_index_cursor.cpp:203
mongo::IndexScan::recoverFromYield()
/srv/10gen/mci-exec/mci/shell/src/src/mongo/db/exec/index_scan.cpp:250
mongo::PlanExecutor::restoreState()
/srv/10gen/mci-exec/mci/shell/src/src/mongo/db/query/plan_executor.cpp:58
mongo::PlanExecutor::getNext(mongo::BSONObj*, mongo::DiskLoc*)
/srv/10gen/mci-exec/mci/shell/src/src/mongo/db/query/plan_executor.cpp:161
mongo::MultiPlanRunner::getNext(mongo::BSONObj*, mongo::DiskLoc*)
/srv/10gen/mci-exec/mci/shell/src/src/mongo/db/query/multi_plan_runner.cpp:288
mongo::DocumentSourceCursor::loadBatch()
/srv/10gen/mci-exec/mci/shell/src/src/mongo/db/pipeline/document_source_cursor.cpp:89
mongo::DocumentSourceCursor::getNext()
/srv/10gen/mci-exec/mci/shell/src/src/mongo/db/pipeline/document_source_cursor.cpp:57
boost::optional_detail::optional_base<mongo::Document>::safe_bool() const
/srv/10gen/mci-exec/mci/shell/src/src/third_party/boost/boost/optional/optional.hpp:442
boost::optional<mongo::Document>::operator bool (boost::optional_detail::optional_base<mongo::Document>::*)() const() const
/srv/10gen/mci-exec/mci/shell/src/src/third_party/boost/boost/optional/optional.hpp:651
mongo::DocumentSourceGroup::populate()
/srv/10gen/mci-exec/mci/shell/src/src/mongo/db/pipeline/document_source_group.cpp:359
mongo::DocumentSourceGroup::getNext()
/srv/10gen/mci-exec/mci/shell/src/src/mongo/db/pipeline/document_source_group.cpp:51
boost::optional_detail::optional_base<mongo::Document>::safe_bool() const
/srv/10gen/mci-exec/mci/shell/src/src/third_party/boost/boost/optional/optional.hpp:442
boost::optional<mongo::Document>::operator bool (boost::optional_detail::optional_base<mongo::Document>::*)() const() const
/srv/10gen/mci-exec/mci/shell/src/src/third_party/boost/boost/optional/optional.hpp:651
mongo::Pipeline::run(mongo::BSONObjBuilder&)
/srv/10gen/mci-exec/mci/shell/src/src/mongo/db/pipeline/pipeline.cpp:501
boost::scoped_ptr<mongo::ClientCursorPin>::operator mongo::ClientCursorPin* boost::scoped_ptr<mongo::ClientCursorPin>::*() const
/srv/10gen/mci-exec/mci/shell/src/src/third_party/boost/boost/smart_ptr/detail/operator_bool.hpp:47
mongo::PipelineCommand::run(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::BSONObj&, int, std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, mongo::BSONObjBuilder&, bool)
/srv/10gen/mci-exec/mci/shell/src/src/mongo/db/commands/pipeline_command.cpp:359
mongo::_execCommand(mongo::Command*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::BSONObj&, int, std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, mongo::BSONObjBuilder&, bool)
/srv/10gen/mci-exec/mci/shell/src/src/mongo/db/dbcommands.cpp:1357
mongo::Command::execCommand(mongo::Command*, mongo::Client&, int, char const*, mongo::BSONObj&, mongo::BSONObjBuilder&, bool)
/srv/10gen/mci-exec/mci/shell/src/src/mongo/db/dbcommands.cpp:1564
mongo::BSONObjBuilder::_done()
/srv/10gen/mci-exec/mci/shell/src/src/mongo/bson/bsonobjbuilder.h:680
mongo::BSONObjBuilder::done()
/srv/10gen/mci-exec/mci/shell/src/src/mongo/bson/bsonobjbuilder.h:588
mongo::_runCommands(char const*, mongo::BSONObj&, mongo::_BufBuilder<mongo::TrivialAllocator>&, mongo::BSONObjBuilder&, bool, int)
/srv/10gen/mci-exec/mci/shell/src/src/mongo/db/dbcommands.cpp:1704

Comment by J Rassi [ 20/Aug/14 ]

I also noticed that the stack trace pasted from the log snippet indicates that the server crash was during a run of the "aggregate" command, not the "mapReduce" command (and, the log does indicate that the same database thread that was running "aggregate" at 10:05 was running "mapReduce" at 8:02). So, I'll amend my second request for information as such: do you know what the full invocation was of the /aggregate/ command that was being run on this secondary at the time?

Comment by J Rassi [ 20/Aug/14 ]

Hi Roy,

I'd like to gather additional information to further diagnose this issue:

  • Could you upload the full mongod log for this member as an attachment to the ticket?
  • Do you know what the full invocation was of the mapReduce command that was being run? If it is clear where in your application this is being run, then pasting the relevant line(s) from the application code should be sufficient.
  • Have you subsequently been able to reproduce this issue?

Thanks for reporting the issue.

~ Jason Rassi

Generated at Thu Feb 08 03:36:32 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.