[SERVER-899] mapReduce fails on large sharded collection Created: 08/Jan/10  Updated: 12/Jul/16  Resolved: 27/Jan/10

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 1.3.5

Type: Bug Priority: Critical - P2
Reporter: Ben B. Assignee: Mathias Stearn
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

mongos built from git clone on 2010/1/7
$ uname -a
Linux toolbox 2.6.18-53.el5 #1 SMP Mon Nov 12 02:22:48 EST 2007 i686 i686 i386 GNU/Linux


Attachments: File buzzfeed.report.json.gz     File exhibit-a.pl     File exhibit-b.pl    
Participants:

 Description   

The following mapReduce runs successfully on a smaller collection in the same sharded db or on the same collection but on a non-sharded DB.

--------------------------------------------------------------------------------

> $mapv = function ()

{ emit(this.v, 1); }

function () {
emit(this.v, 1);
}
> $reducev = function ($vis_id, $counts) { var $count = 0; $counts.forEach(function ($c) {$count += $c;}); return $count; }
function ($vis_id, $counts) {
var $count = 0;
$counts.forEach(function ($c) {$count += $c;});
return $count;
}
> $resv = db.vis1.mapReduce( $mapv, $reducev, { query :

{ p : '78053631' }

} );
Fri Jan 8 12:54:08 JS Error: uncaught exception: map reduce failed: final reduce failed:

{ result: "tmp.mr.mapreduce_1262973236_124", errmsg: "assertion: assertion db/jsobj.h:1584", ok: 0.0 }

 Comments   
Comment by Douglas Hunter [ 27/Jan/10 ]

I can no longer reproduce this bug with a build from git master. Lovely, thanks!

Comment by Mathias Stearn [ 26/Jan/10 ]

@Dan and Ben: Could you try running with a mongod compiled from git master or apply that diff to whichever version you are using? I can now reliably run the provided test code.

Comment by auto [ 26/Jan/10 ]

Author:

{'login': 'RedBeard0531', 'name': 'Mathias Stearn', 'email': 'mathias@10gen.com'}

Message: Fix bug in sharded MapReduce. SHARDING-68
http://github.com/mongodb/mongo/commit/922532f0a5b2d181d3a84c3c774c590a6b16dc33

Comment by Eliot Horowitz (Inactive) [ 26/Jan/10 ]

can you run attached code w/data and try to reproduce

Comment by Douglas Hunter [ 26/Jan/10 ]

Both scripts need to have $MONGO_BIN and $DB_DUMP configured to point to the mongodb bin directory and the buzzfeed.report.json.gz file respectively.

exhibit-a.pl starts a single mongod process, loads the db dump and successfully executes a mapreduce command, then prints a report.

exhibit-b.pl starts and configures a sharded setup, load the db dump and unsuccessfully executes a mapreduce command, dying with:

client says:

query error: mongos: DBClientBase::findOne: transport error at /usr/local/lib/perl5/site_perl/5.10.1/i686-linux/MongoDB/Connection.pm line 196.

/tmp/mongos.log says:

Tue Jan 26 12:31:27 Assertion: 10276:DBClientBase::findOne: transport error
Tue Jan 26 12:31:27 Assertion: 10276:DBClientBase::findOne: transport error
0x80f5c13 0x810589c 0x8114943 0x81505f9 0x80e4443 0x80e4dcf 0x81a221d 0xb7e04fda
xb7d8c93e
/home/dug/src/mongodb-linux-i686-2010-01-19/bin/mongos(_ZN5mongo11msgassertedEiP
c+0x223) [0x80f5c13]
/home/dug/src/mongodb-linux-i686-2010-01-19/bin/mongos(_ZN5mongo17DBClientInterf
ce7findOneERKSsNS_5QueryEPKNS_7BSONObjEi+0x1ac) [0x810589c]
/home/dug/src/mongodb-linux-i686-2010-01-19/bin/mongos(_ZN5mongo20DBClientWithCo
mands10runCommandERKSsRKNS_7BSONObjERS3_i+0x93) [0x8114943]
/home/dug/src/mongodb-linux-i686-2010-01-19/bin/mongos(_ZN5mongo17WriteBackListe
er3runEv+0x399) [0x81505f9]
/home/dug/src/mongodb-linux-i686-2010-01-19/bin/mongos(_ZN5mongo13BackgroundJob3
hrEv+0x43) [0x80e4443]
/home/dug/src/mongodb-linux-i686-2010-01-19/bin/mongos(ZN5boost6detail11thread
ataIPFvvEE3runEv+0xf) [0x80e4dcf]
/home/dug/src/mongodb-linux-i686-2010-01-19/bin/mongos(thread_proxy+0x7d) [0x81a
21d]
/lib/libpthread.so.0 [0xb7e04fda]
/lib/libc.so.6(clone+0x5e) [0xb7d8c93e]
0x80f5c13 0x810589c 0x8114943 0x816cb49 0x80ca48a 0x81587ee 0x81727c3 0x8176ec7 0
8149dac 0x80e4dcf 0x81a221d 0xb7e04fda 0xb7d8c93e
/home/dug/src/mongodb-linux-i686-2010-01-19/bin/mongos(_ZN5mongo11msgassertedEiP
c+0x223) [0x80f5c13]
/home/dug/src/mongodb-linux-i686-2010-01-19/bin/mongos(_ZN5mongo17DBClientInterf
ce7findOneERKSsNS_5QueryEPKNS_7BSONObjEi+0x1ac) [0x810589c]
/home/dug/src/mongodb-linux-i686-2010-01-19/bin/mongos(_ZN5mongo20DBClientWithCo
mands10runCommandERKSsRKNS_7BSONObjERS3_i+0x93) [0x8114943]
/home/dug/src/mongodb-linux-i686-2010-01-19/bin/mongos(_ZN5mongo15dbgrid_pub_cmd
5MRCmd3runEPKcRNS_7BSONObjERSsRNS_14BSONObjBuilderEb+0x10e9) [0x816cb49]
/home/dug/src/mongodb-linux-i686-2010-01-19/bin/mongos(_ZN5mongo27runCommandAgai
stRegisteredEPKcRNS_7BSONObjERNS_14BSONObjBuilderE+0xc7a) [0x80ca48a]
/home/dug/src/mongodb-linux-i686-2010-01-19/bin/mongos(_ZN5mongo14SingleStrategy
queryOpERNS_7RequestE+0x26e) [0x81587ee]
/home/dug/src/mongodb-linux-i686-2010-01-19/bin/mongos(_ZN5mongo7Request7process
i+0x413) [0x81727c3]
/home/dug/src/mongodb-linux-i686-2010-01-19/bin/mongos(_ZN5mongo21ShardedMessage
andler7processERNS_7MessageEPNS_21AbstractMessagingPortE+0x147) [0x8176ec7]
/home/dug/src/mongodb-linux-i686-2010-01-19/bin/mongos(_ZN5mongo3pms9threadRunEv
0x7c) [0x8149dac]
/home/dug/src/mongodb-linux-i686-2010-01-19/bin/mongos(ZN5boost6detail11thread
ataIPFvvEE3runEv+0xf) [0x80e4dcf]
†/home/dug/src/mongodb-linux-i686-2010-01-19/bin/mongos(thread_proxy+0x7d) [0x81a
21d]
/lib/libpthread.so.0 [0xb7e04fda]
/lib/libc.so.6(clone+0x5e) [0xb7d8c93e]

Comment by Douglas Hunter [ 25/Jan/10 ]

I'm seeing a similar problem (tested with the mongodb-linux-i686-1.2.1 and mongodb-linux-i686-2010-01-19 binaries), but it manifests itself differently.

MongoDB shell version: 1.2.1
url: buzzfeed
connecting to: buzzfeed
type "help" for help

> db.report.count()
500000

> db.getSisterDB( "config" ).chunks.find().length()
6

> printShardingStatus( db.getSisterDB( "config" ) );
— Sharding Status —
sharding version:

{ "_id" : ObjectId("4b5dd7bdcb9a647fc3a70b3f"), "version" : 2 }

shards:

{ "_id" : ObjectId("4b5dd862cb9a647fc3a70b40"), "host" : "localhost:3335" } { "_id" : ObjectId("4b5dd865cb9a647fc3a70b41"), "host" : "localhost:3336" } { "_id" : ObjectId("4b5dd869cb9a647fc3a70b42"), "host" : "localhost:3337" } { "_id" : ObjectId("4b5dd86ccb9a647fc3a70b43"), "host" : "localhost:3338" }

databases:

{ "name" : "admin", "partitioned" : false, "primary" : "localhost:3334", "_id" : ObjectId("4b5dd8626584f9ac129a2293") }

my chunks
{ "name" : "buzzfeed", "partitioned" : true, "primary" : "localhost:3338", "sharded" : { "buzzfeed.report" : { "key" :

{ "timestamp" : 1 }

, "unique" : false } }, "_id" : ObjectId("4b5dd8766584f9ac129a2294") }
my chunks
buzzfeed.report { "timestamp" :

{ $minKey : 1 }

} -->>

{ "timestamp" : "14628" }

on : localhost:3337

{ "t" : 1264442588000, "i" : 1 }

buzzfeed.report

{ "timestamp" : "14628" }

-->>

{ "timestamp" : "14628" }

on : localhost:3336

{ "t" : 1264442772000, "i" : 1 }

buzzfeed.report

{ "timestamp" : "14629" }

-->> { "timestamp" :

{ $maxKey : 1 }

} on : localhost:3338

{ "t" : 1264442924000, "i" : 1 }

buzzfeed.report

{ "timestamp" : "14628" }

-->>

{ "timestamp" : "14628" }

on : localhost:3338

{ "t" : 1264442866000, "i" : 2 }

buzzfeed.report

{ "timestamp" : "14628" }

-->>

{ "timestamp" : "14628" }

on : localhost:3336

{ "t" : 1264442924000, "i" : 2 }

buzzfeed.report

{ "timestamp" : "14628" }

-->>

{ "timestamp" : "14629" }

on : localhost:3338

{ "t" : 1264442923000, "i" : 3 }

> db.report.findOne()
{
"_id" : ObjectId("4b5dde26da8ed17a70d90a71"),
"query" : "glenn beck's 912",
"buzzid" : "fffeb50517894e3a5b03dd21da775fd4",
"userid" : "10032",
"total" : "1",
"domain" : "bing.com",
"buzz" : "http://www.huffingtonpost.com/tag/glenn-beck-912/1",
"timestamp" : "14627",
"user" : "",
"type" : "3"
}

> m = function () {
emit(this.buzzid,

{total:this.total}

);
};

> r = function (key, values) {
var sum = 0;
values.forEach(function (doc)

{sum += parseInt(doc.total);}

);
return

{total:sum}

;
};

> res = db.runCommand(

{ mapreduce: "report", map : m, reduce : r }

);

Mon Jan 25 13:37:52 JS Error: uncaught exception: error

{ "$err" : "mongos: DBClientBase::findOne: transport error" }

And from the log of the primary shard:

Mon Jan 25 13:37:51 Got signal: 11 (Segmentation fault).
Mon Jan 25 13:37:51 Backtrace:
0x827b932 0xb7fe0420 0x80cd4df 0x824edbf 0x8259f2e 0x816ad73 0x816ea98 0x81fde11 0x8200a5e 0x827a492 0x810f93f 0x829321d 0xb7e93fda 0xb7e1b93e
../src/mongodb-linux-i686-1.2.1/bin/mongod(_ZN5mongo10abruptQuitEi+0x3c2) [0x827b932]
[0xb7fe0420]
../src/mongodb-linux-i686-1.2.1/bin/mongod(ZNK5mongo7BSONObj11woSortOrderERKS0_S2+0x2f) [0x80cd4df]
../src/mongodb-linux-i686-1.2.1/bin/mongod(_ZN5mongo2mr22MapReduceFinishCommand3runEPKcRNS_7BSONObjERSsRNS_14BSONObjBuilderEb+0x104f) [0x824edbf]
../src/mongodb-linux-i686-1.2.1/bin/mongod(_ZN5mongo12_runCommandsEPKcRNS_7BSONObjERSt18basic_stringstreamIcSt11char_traitsIcESaIcEERNS_10BufBuilderERNS_14BSONObjBuilderEbi+0xbbe) [0x8259f2e]
../src/mongodb-linux-i686-1.2.1/bin/mongod(_ZN5mongo11runCommandsEPKcRNS_7BSONObjERSt18basic_stringstreamIcSt11char_traitsIcESaIcEERNS_10BufBuilderERNS_14BSONObjBuilderEbi+0x53) [0x816ad73]
../src/mongodb-linux-i686-1.2.1/bin/mongod(_ZN5mongo8runQueryERNS_7MessageERSt18basic_stringstreamIcSt11char_traitsIcESaIcEE+0x21d8) [0x816ea98]
../src/mongodb-linux-i686-1.2.1/bin/mongod(_ZN5mongo13receivedQueryERNS_10DbResponseERNS_7MessageERSt18basic_stringstreamIcSt11char_traitsIcESaIcEEb+0x101) [0x81fde11]
../src/mongodb-linux-i686-1.2.1/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERK11sockaddr_in+0xb8e) [0x8200a5e]
../src/mongodb-linux-i686-1.2.1/bin/mongod(_ZN5mongo10connThreadEv+0x252) [0x827a492]
../src/mongodb-linux-i686-1.2.1/bin/mongod(_ZN5boost6detail11thread_dataIPFvvEE3runEv+0xf) [0x810f93f]
../src/mongodb-linux-i686-1.2.1/bin/mongod(thread_proxy+0x7d) [0x829321d]
/lib/libpthread.so.0 [0xb7e93fda]
/lib/libc.so.6(clone+0x5e) [0xb7e1b93e]
Mon Jan 25 13:37:51 dbexit:
Mon Jan 25 13:37:51 shutdown: going to flush oplog...
Mon Jan 25 13:37:51 shutdown: going to close sockets...
Mon Jan 25 13:37:51 shutdown: waiting for fs...
Mon Jan 25 13:37:51 shutdown: closing all files...
Mon Jan 25 13:37:51 end connection 127.0.0.1:43478
Mon Jan 25 13:37:51 closeAllFiles() finished
Mon Jan 25 13:37:52 shutdown: removing fs lock...
Mon Jan 25 13:37:52 dbexit: really exiting now

Generated at Thu Feb 08 02:55:29 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.