[SERVER-899] mapReduce fails on large sharded collection Created: 08/Jan/10 Updated: 12/Jul/16 Resolved: 27/Jan/10 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 1.3.5 |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Ben B. | Assignee: | Mathias Stearn |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
mongos built from git clone on 2010/1/7 |
||
| Attachments: |
|
| Participants: |
| Description |
|
The following mapReduce runs successfully on a smaller collection in the same sharded db or on the same collection but on a non-sharded DB. -------------------------------------------------------------------------------- > $mapv = function () { emit(this.v, 1); }function () { } ); |
| Comments |
| Comment by Douglas Hunter [ 27/Jan/10 ] |
|
I can no longer reproduce this bug with a build from git master. Lovely, thanks! |
| Comment by Mathias Stearn [ 26/Jan/10 ] |
|
@Dan and Ben: Could you try running with a mongod compiled from git master or apply that diff to whichever version you are using? I can now reliably run the provided test code. |
| Comment by auto [ 26/Jan/10 ] |
|
Author: {'login': 'RedBeard0531', 'name': 'Mathias Stearn', 'email': 'mathias@10gen.com'}Message: Fix bug in sharded MapReduce. |
| Comment by Eliot Horowitz (Inactive) [ 26/Jan/10 ] |
|
can you run attached code w/data and try to reproduce |
| Comment by Douglas Hunter [ 26/Jan/10 ] |
|
Both scripts need to have $MONGO_BIN and $DB_DUMP configured to point to the mongodb bin directory and the buzzfeed.report.json.gz file respectively. exhibit-a.pl starts a single mongod process, loads the db dump and successfully executes a mapreduce command, then prints a report. exhibit-b.pl starts and configures a sharded setup, load the db dump and unsuccessfully executes a mapreduce command, dying with: client says: query error: mongos: DBClientBase::findOne: transport error at /usr/local/lib/perl5/site_perl/5.10.1/i686-linux/MongoDB/Connection.pm line 196. /tmp/mongos.log says: Tue Jan 26 12:31:27 Assertion: 10276:DBClientBase::findOne: transport error |
| Comment by Douglas Hunter [ 25/Jan/10 ] |
|
I'm seeing a similar problem (tested with the mongodb-linux-i686-1.2.1 and mongodb-linux-i686-2010-01-19 binaries), but it manifests itself differently. MongoDB shell version: 1.2.1 > db.report.count() > db.getSisterDB( "config" ).chunks.find().length() > printShardingStatus( db.getSisterDB( "config" ) ); shards: { "_id" : ObjectId("4b5dd862cb9a647fc3a70b40"), "host" : "localhost:3335" } { "_id" : ObjectId("4b5dd865cb9a647fc3a70b41"), "host" : "localhost:3336" } { "_id" : ObjectId("4b5dd869cb9a647fc3a70b42"), "host" : "localhost:3337" } { "_id" : ObjectId("4b5dd86ccb9a647fc3a70b43"), "host" : "localhost:3338" }databases: { "name" : "admin", "partitioned" : false, "primary" : "localhost:3334", "_id" : ObjectId("4b5dd8626584f9ac129a2293") } my chunks , "unique" : false } }, "_id" : ObjectId("4b5dd8766584f9ac129a2294") } } -->> { "timestamp" : "14628" }on : localhost:3337 { "t" : 1264442588000, "i" : 1 }buzzfeed.report { "timestamp" : "14628" }-->> { "timestamp" : "14628" }on : localhost:3336 { "t" : 1264442772000, "i" : 1 }buzzfeed.report { "timestamp" : "14629" }-->> { "timestamp" : { $maxKey : 1 }} on : localhost:3338 { "t" : 1264442924000, "i" : 1 }buzzfeed.report { "timestamp" : "14628" }-->> { "timestamp" : "14628" }on : localhost:3338 { "t" : 1264442866000, "i" : 2 }buzzfeed.report { "timestamp" : "14628" }-->> { "timestamp" : "14628" }on : localhost:3336 { "t" : 1264442924000, "i" : 2 }buzzfeed.report { "timestamp" : "14628" }-->> { "timestamp" : "14629" }on : localhost:3338 { "t" : 1264442923000, "i" : 3 }> db.report.findOne() > m = function () { ); > r = function (key, values) { ); ; > res = db.runCommand( { mapreduce: "report", map : m, reduce : r }); Mon Jan 25 13:37:52 JS Error: uncaught exception: error { "$err" : "mongos: DBClientBase::findOne: transport error" }And from the log of the primary shard: Mon Jan 25 13:37:51 Got signal: 11 (Segmentation fault). |