[SERVER-15842] Server aborts during concurrent mapreduce and drop database Created: 28/Oct/14  Updated: 11/Jul/16  Resolved: 04/Nov/14

Status: Closed
Project: Core Server
Component/s: Concurrency, MapReduce
Affects Version/s: 2.7.8
Fix Version/s: 2.8.0-rc0

Type: Bug Priority: Major - P3
Reporter: Andrew Emil (Inactive) Assignee: Eric Milkie
Resolution: Done Votes: 0
Labels: 28qa
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-16038 CanonicalQuery leaks in mr::MapReduce... Closed
is related to SERVER-15087 Server crashes when running concurren... Closed
Operating System: ALL
Participants:

 Description   

Triggered this bug against a mongod with the following build info:

> db.runCommand({buildinfo: 1})
{
	"version" : "2.7.9-pre-",
	"gitVersion" : "364b03260ef5e06a58568c57a1e44fd2fcd7cb8e",
	"OpenSSLVersion" : "",
	"sysInfo" : "Linux acebox 3.2.0-4-amd64 #1 SMP Debian 3.2.54-2 x86_64 BOOST_LIB_VERSION=1_49",
	"loaderFlags" : "-fPIC -pthread -Wl,-z,now -rdynamic",
	"compilerFlags" : "-Wnon-virtual-dtor -Woverloaded-virtual -std=c++11 -fPIC -fno-strict-aliasing -ggdb -pthread -Wall -Wsign-compare -Wno-unknown-pragmas -Winvalid-pch -pipe -Werror -O3 -Wno-unused-local-typedefs -Wno-unused-function -Wno-deprecated-declarations -Wno-unused-but-set-variable -fno-builtin-memcmp -std=c99",
	"allocator" : "tcmalloc",
	"versionArray" : [
		2,
		7,
		9,
		-100
	],
	"javascriptEngine" : "V8",
	"bits" : 64,
	"debug" : false,
	"maxBsonObjectSize" : 16777216,
	"ok" : 1
}
> 

Demangled stack trace:

 mongod(mongo::printStackTrace(std::basic_ostream<char, std::char_traits<char> >&)+0x23) [0xfad813]
 mongod(+0xBAD3C2) [0xfad3c2]
 mongod(+0xBAD746) [0xfad746]
 libc.so.6(+0x324F0) [0x7f717c2e84f0]
 libc.so.6(gsignal+0x35) [0x7f717c2e8475]
 libc.so.6(abort+0x180) [0x7f717c2eb6f0]
 mongod(mongo::invariantFailed(char const*, char const*, unsigned int)+0xCA) [0xf47cca]
 mongod(mongo::mr::MapReduceCommand::run(mongo::OperationContext*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::BSONObj&, int, std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, mongo::BSONObjBuilder&, bool)+0x16E2) [0xa471a2]
 mongod(mongo::_execCommand(mongo::OperationContext*, mongo::Command*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::BSONObj&, int, std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, mongo::BSONObjBuilder&, bool)+0x42) [0xa8f7e2]
 mongod(mongo::Command::execCommand(mongo::OperationContext*, mongo::Command*, int, char const*, mongo::BSONObj&, mongo::BSONObjBuilder&, bool)+0x8EB) [0xa902cb]
 mongod(mongo::_runCommands(mongo::OperationContext*, char const*, mongo::BSONObj&, mongo::_BufBuilder<mongo::TrivialAllocator>&, mongo::BSONObjBuilder&, bool, int)+0x487) [0xa90c17]
 mongod(mongo::newRunQuery(mongo::OperationContext*, mongo::Message&, mongo::QueryMessage&, mongo::CurOp&, mongo::Message&, bool)+0x7EF) [0xc5c86f]
 mongod(mongo::assembleResponse(mongo::OperationContext*, mongo::Message&, mongo::DbResponse&, mongo::HostAndPort const&, bool)+0xC26) [0xb63f26]
 mongod(mongo::MyMessageHandler::process(mongo::Message&, mongo::AbstractMessagingPort*, mongo::LastError*)+0xA3) [0x8e7a93]
 mongod(mongo::PortMessageServer::handleIncomingMsg(void*)+0x48B) [0xf743cb]
 libpthread.so.0(+0x6B50) [0x7f717d1f2b50]
 libc.so.6(clone+0x6D) [0x7f717c3920ed]
-----  END BACKTRACE  -----

Relevent log snippit:

2014-10-28T09:07:56.771-0700 I COMMANDS [conn2] dropDatabase testdb starting
2014-10-28T09:07:56.776-0700 D STORAGE  [conn2] dropDatabase testdb
2014-10-28T09:07:56.776-0700 D JOURNAL  [journal] journal REMAPPRIVATEVIEW done startedAt: 2 n:1 0ms
2014-10-28T09:07:56.795-0700 I JOURNAL  [conn2] removeJournalFiles
2014-10-28T09:07:56.804-0700 D STORAGE  [conn2] flushing directory /home/ace/scratch/CAP-1268/data/journal
2014-10-28T09:07:56.822-0700 D JOURNAL  [conn2] removeJournalFiles end
2014-10-28T09:07:56.824-0700 D JOURNAL  [journal] journal REMAPPRIVATEVIEW done startedAt: 3 n:1 1ms
2014-10-28T09:07:56.824-0700 D JOURNAL  [journal] journal REMAPPRIVATEVIEW done startedAt: 0 n:1 0ms
2014-10-28T09:07:56.829-0700 D STORAGE  [conn2] remove file /home/ace/scratch/CAP-1268/data/testdb.ns
2014-10-28T09:07:56.831-0700 I COMMANDS [conn2] dropDatabase testdb finished
2014-10-28T09:07:56.831-0700 I QUERY    [conn2] command testdb.$cmd command: dropDatabase { dropDatabase: 1.0 } keyUpdates:0 numYields:0  reslen:57 230ms
2014-10-28T09:07:56.831-0700 I -        [conn3] Invariant failure db src/mongo/db/commands/mr.cpp 1366
2014-10-28T09:07:56.845-0700 I CONTROL  [conn3] 
 0xfad813 0xf6260b 0xf47c99 0xa471a2 0xa8f7e2 0xa902cb 0xa90c17 0xc5c86f 0xb63f26 0x8e7a93 0xf743cb 0x7f717d1f2b50 0x7f717c3920ed
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"BAD813"},{"b":"400000","o":"B6260B"},{"b":"400000","o":"B47C99"},{"b":"400000","o":"6471A2"},{"b":"400000","o":"68F7E2"},{"b":"400000","o":"6902CB"},{"b":"400000","o":"690C17"},{"b":"400000","o":"85C86F"},{"b":"400000","o":"763F26"},{"b":"400000","o":"4E7A93"},{"b":"400000","o":"B743CB"},{"b":"7F717D1EC000","o":"6B50"},{"b":"7F717C2B6000","o":"DC0ED"}],"processInfo":{ "mongodbVersion" : "2.7.9-pre-", "gitVersion" : "364b03260ef5e06a58568c57a1e44fd2fcd7cb8e", "uname" : { "sysname" : "Linux", "release" : "3.2.0-4-amd64", "version" : "#1 SMP Debian 3.2.54-2", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "B787723AE5862AF9456FDA1E3A515DBE580B06D5" }, { "b" : "7FFF77BEF000", "elfType" : 3, "buildId" : "DCCF85726CC4A0AF9876B5683FE535A43F8C7B93" }, { "b" : "7F717D1EC000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "18A3AA880A2D6668749F9E0B3A62E31CC332BCCB" }, { "b" : "7F717CFE4000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "64EDF4AB5F6BB69CC21C5189A5381632828C1DDF" }, { "b" : "7F717CDE0000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "0A2AFB493CB6339DC33C3FCD42B31A56256DB883" }, { "b" : "7F717CAD9000", "path" : "/usr/lib/x86_64-linux-gnu/libstdc++.so.6", "elfType" : 3, "buildId" : "8711429397A5AF8B6269B867D830EDF6E0225B8D" }, { "b" : "7F717C857000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "6F617CE57B239A34D5262165156D3C427FBC4E6F" }, { "b" : "7F717C641000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "F980B1188708F8D8B5C35D185444AF4CB939AA1E" }, { "b" : "7F717C2B6000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "44E7FC086557EB258EAFE64F71E5E5D4E46DF815" }, { "b" : "7F717D408000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "3A7624F2274F9BD6E3D5F171B72BC08D1F7F8EE0" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x23) [0xfad813]
 mongod(_ZN5mongo10logContextEPKc+0xDB) [0xf6260b]
 mongod(_ZN5mongo15invariantFailedEPKcS1_j+0x99) [0xf47c99]
 mongod(_ZN5mongo2mr16MapReduceCommand3runEPNS_16OperationContextERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x16E2) [0xa471a2]
 mongod(_ZN5mongo12_execCommandEPNS_16OperationContextEPNS_7CommandERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x42) [0xa8f7e2]
 mongod(_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_iPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0x8EB) [0xa902cb]
 mongod(_ZN5mongo12_runCommandsEPNS_16OperationContextEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x487) [0xa90c17]
 mongod(_ZN5mongo11newRunQueryEPNS_16OperationContextERNS_7MessageERNS_12QueryMessageERNS_5CurOpES3_b+0x7EF) [0xc5c86f]
 mongod(_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortEb+0xC26) [0xb63f26]
 mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0xA3) [0x8e7a93]
 mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x48B) [0xf743cb]
 libpthread.so.0(+0x6B50) [0x7f717d1f2b50]
 libc.so.6(clone+0x6D) [0x7f717c3920ed]
-----  END BACKTRACE  -----
2014-10-28T09:07:56.845-0700 I -        [conn3] 
 
***aborting after invariant() failure
 
 
2014-10-28T09:07:56.859-0700 F -        [conn3] Got signal: 6 (Aborted).
 
 0xfad813 0xfad3c2 0xfad746 0x7f717c2e84f0 0x7f717c2e8475 0x7f717c2eb6f0 0xf47cca 0xa471a2 0xa8f7e2 0xa902cb 0xa90c17 0xc5c86f 0xb63f26 0x8e7a93 0xf743cb 0x7f717d1f2b50 0x7f717c3920ed
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"BAD813"},{"b":"400000","o":"BAD3C2"},{"b":"400000","o":"BAD746"},{"b":"7F717C2B6000","o":"324F0"},{"b":"7F717C2B6000","o":"32475"},{"b":"7F717C2B6000","o":"356F0"},{"b":"400000","o":"B47CCA"},{"b":"400000","o":"6471A2"},{"b":"400000","o":"68F7E2"},{"b":"400000","o":"6902CB"},{"b":"400000","o":"690C17"},{"b":"400000","o":"85C86F"},{"b":"400000","o":"763F26"},{"b":"400000","o":"4E7A93"},{"b":"400000","o":"B743CB"},{"b":"7F717D1EC000","o":"6B50"},{"b":"7F717C2B6000","o":"DC0ED"}],"processInfo":{ "mongodbVersion" : "2.7.9-pre-", "gitVersion" : "364b03260ef5e06a58568c57a1e44fd2fcd7cb8e", "uname" : { "sysname" : "Linux", "release" : "3.2.0-4-amd64", "version" : "#1 SMP Debian 3.2.54-2", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "B787723AE5862AF9456FDA1E3A515DBE580B06D5" }, { "b" : "7FFF77BEF000", "elfType" : 3, "buildId" : "DCCF85726CC4A0AF9876B5683FE535A43F8C7B93" }, { "b" : "7F717D1EC000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "18A3AA880A2D6668749F9E0B3A62E31CC332BCCB" }, { "b" : "7F717CFE4000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "64EDF4AB5F6BB69CC21C5189A5381632828C1DDF" }, { "b" : "7F717CDE0000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "0A2AFB493CB6339DC33C3FCD42B31A56256DB883" }, { "b" : "7F717CAD9000", "path" : "/usr/lib/x86_64-linux-gnu/libstdc++.so.6", "elfType" : 3, "buildId" : "8711429397A5AF8B6269B867D830EDF6E0225B8D" }, { "b" : "7F717C857000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "6F617CE57B239A34D5262165156D3C427FBC4E6F" }, { "b" : "7F717C641000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "F980B1188708F8D8B5C35D185444AF4CB939AA1E" }, { "b" : "7F717C2B6000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "44E7FC086557EB258EAFE64F71E5E5D4E46DF815" }, { "b" : "7F717D408000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "3A7624F2274F9BD6E3D5F171B72BC08D1F7F8EE0" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x23) [0xfad813]
 mongod(+0xBAD3C2) [0xfad3c2]
 mongod(+0xBAD746) [0xfad746]
 libc.so.6(+0x324F0) [0x7f717c2e84f0]
 libc.so.6(gsignal+0x35) [0x7f717c2e8475]
 libc.so.6(abort+0x180) [0x7f717c2eb6f0]
 mongod(_ZN5mongo15invariantFailedEPKcS1_j+0xCA) [0xf47cca]
 mongod(_ZN5mongo2mr16MapReduceCommand3runEPNS_16OperationContextERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x16E2) [0xa471a2]
 mongod(_ZN5mongo12_execCommandEPNS_16OperationContextEPNS_7CommandERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x42) [0xa8f7e2]
 mongod(_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_iPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0x8EB) [0xa902cb]
 mongod(_ZN5mongo12_runCommandsEPNS_16OperationContextEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x487) [0xa90c17]
 mongod(_ZN5mongo11newRunQueryEPNS_16OperationContextERNS_7MessageERNS_12QueryMessageERNS_5CurOpES3_b+0x7EF) [0xc5c86f]
 mongod(_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortEb+0xC26) [0xb63f26]
 mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0xA3) [0x8e7a93]
 mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x48B) [0xf743cb]
 libpthread.so.0(+0x6B50) [0x7f717d1f2b50]
 libc.so.6(clone+0x6D) [0x7f717c3920ed]
-----  END BACKTRACE  -----



 Comments   
Comment by Githook User [ 03/Nov/14 ]

Author:

{u'username': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-15842 check db pointer after establishing read lock in M/R
Branch: master
https://github.com/mongodb/mongo/commit/23b96d4da3444d260eee24d7217a8054f3aacfa3

Comment by Eric Milkie [ 31/Oct/14 ]

I think I see the problem. In 2.6, upon initial entry into the MR command's run method, we got a ReadContext on the database. This implicitly creates the db if it doesn't exist.

In 2.7, for collection level locking support, this was changed to a simple DBRead lock, which doesn't implicitly create databases. The code then attempts to get the db pointer and assumes it will always be valid (this is the invariant that is firing). I think instead, we should see if the database isn't there and implicitly create it. Alternatively, we could return an error from the command.

Comment by Eric Milkie [ 28/Oct/14 ]

I noticed while looking at example usage of PlanExecutor that there is some MR code that ignores unexpected errors coming from getNext(). Now that we can yield in executors again, we should probably audit all usage of getNext() and make sure we are handling errors.

Generated at Thu Feb 08 03:39:09 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.