[SERVER-1951] M/R output param should allow us to use db.collection not only collection Created: 15/Oct/10  Updated: 12/Jul/16  Resolved: 24/Jan/11

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: None
Fix Version/s: 1.7.5

Type: Improvement Priority: Minor - P4
Reporter: Alberto Assignee: Antoine Girbal
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Participants:

 Description   

Should add a field to out

{ out :

{ merge : "foo" , db : "other" }

}

tricky part is filling in "result" field.
I think best solution is having it start with a .
so normal would be "result" ; "foo"
with this would be ".other.foo"

OLD:
We have "rawdata" on a DB, and reduced collections on a different DB.

For us it could be interesting to do output='ReducedDB.NewReducedCollection

This would help moving a lot of data from one DB to another One.

Would be great in combination with this one:

http://jira.mongodb.org/browse/SERVER-647



 Comments   
Comment by Antoine Girbal [ 24/Jan/11 ]

resolved and added test

Comment by Antoine Girbal [ 19/Jan/11 ]

for sharding, had to modify 2 things:

  • mongos: needs to look at the "out" field and if there is a db, determine the primary shard for that db, and use that shard to generate the result.
  • mongod: MapReduceFinishCommand needs extra ouput fields
Comment by auto [ 19/Jan/11 ]

Author:

{u'login': u'agirbal', u'name': u'agirbal', u'email': u'antoine@10gen.com'}

Message: SERVER-1951: support for output to other DB with sharding. Test in jstests/sharding/bigMapReduce.js
https://github.com/mongodb/mongo/commit/1830f9cb5e948abcf5e0fa300b941f46a7c57b27

Comment by Antoine Girbal [ 18/Jan/11 ]

re: sharding
looks like the collection gets properly created in db, but mongos does not show the new collection..

From shell connected to mongos:
> a = db.users.mapReduce(map, reduce, {out: { replace : "out", db: "testdb"}});
{
"result" : "out",
"shardCounts" : {
"localhost:27018" :

{ "input" : 1293, "emit" : 1293, "output" : 1293 }

,
"localhost:27020" :

{ "input" : 709, "emit" : 709, "output" : 709 }

},
"counts" :

{ "emit" : NumberLong(2002), "input" : NumberLong(2002), "output" : NumberLong(2002) }

,
"ok" : 1,
"timeMillis" : 120,
"timing" :

{ "shards" : 73, "final" : 47 }

,
}
> use testdb
switched to db testdb
> show collections
> db.out.find()
>

But then if I log into 1st shard directly..
$ ./mongo --port 27018
MongoDB shell version: 1.7.5-pre-
connecting to: 127.0.0.1:27018/test
> use testdb
switched to db testdb
> db.out.find()

{ "_id" : "Girbal", "value" : 1 } { "_id" : "aazpcvqvvf", "value" : 1 } { "_id" : "abhuxeyusd", "value" : 1 } { "_id" : "ablyriddck", "value" : 1 } { "_id" : "abvwdmzach", "value" : 1 } { "_id" : "abwshxeuis", "value" : 1 } { "_id" : "acckexkccm", "value" : 1 } { "_id" : "aefdqlxiod", "value" : 1 }

is it that mongos needs to update its map of NS?

Comment by Eliot Horowitz (Inactive) [ 17/Jan/11 ]

Isn't working sharded.

Comment by Antoine Girbal [ 06/Jan/11 ]

added test in jstests/mr_replaceIntoDB.js

Comment by auto [ 06/Jan/11 ]

Author:

{u'login': u'agirbal', u'name': u'agirbal', u'email': u'antoine@10gen.com'}

Message: SERVER-1951: M/R output param should allow us to use db.collection not only collection
https://github.com/mongodb/mongo/commit/da0bc2917778593d0368b89427d9d504a1230389

Comment by Antoine Girbal [ 06/Jan/11 ]

ok this is implemented

foo:PRIMARY> db.users.mapReduce(map, reduce, {out: { replace : "outcoll", db:"testdb"}});
{
"result" :

{ "db" : "testdb", "collection" : "outcoll" }

,
"timeMillis" : 1154,
"counts" :

{ "input" : 4000, "emit" : 4000, "output" : 4000 }

,
"ok" : 1,
}
foo:PRIMARY> db.users.mapReduce(map, reduce, {out: { replace : "outcoll"}});
{
"result" : "outcoll",
"timeMillis" : 651,
"counts" :

{ "input" : 4000, "emit" : 4000, "output" : 4000 }

,
"ok" : 1,
}

Comment by Eliot Horowitz (Inactive) [ 06/Jan/11 ]

We use that notation in a few places, but it is a bit subtle.

resultDB is ok.

One thing to keep in mind is we'll probably add an option to write result to an arbitrary server/db/collection at some point as well.

Comment by Antoine Girbal [ 06/Jan/11 ]

I'm not sure the ".other.foo" is the most intuitive way to represent, may look like a bug to ppl.
Unless we use this notation in other places, i.e. a "." to mean other db?
If not, why not use another result field with "db" : "other" or "resultDB":"other".
If user is already explicitly asking for other db, then they are already aware of where collection will be, and otherwise the "db":"other" will remind them.

Generated at Thu Feb 08 02:58:32 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.