[SERVER-6025] Similar mysql syntax "INSERT ... SELECT ..." for sharded collection. Created: 07/Jun/12  Updated: 06/Dec/22  Resolved: 28/Aug/18

Status: Closed
Project: Core Server
Component/s: Performance, Querying, Sharding, Write Ops
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Minor - P4
Reporter: Pavel Chertorogov Assignee: Backlog - Query Team (Inactive)
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Linux, Mongo 2.0.6, 3 shards (each in replica set with 2 machines and 1 arbiter)


Issue Links:
Depends
depends on SERVER-18027 Support aggregation $out to sharded c... Closed
depends on SERVER-12280 allow $out to append to/merge with ex... Closed
Duplicate
duplicates SERVER-18027 Support aggregation $out to sharded c... Closed
Assigned Teams:
Query
Participants:

 Description   

I can not find similar mysql syntax "INSERT ... SELECT ..." in mongo for sharded collection.

db.eval() - does not work on sharded collection
MapReduce - too expensive and slow operation

Use case:
Need select documents from one sharded collection, make rearrangement of fields and upsert it to another sharded collection.
Source collection format:
{
"_id": ObjectId("4fcf7349d144ad591c016534"),
"sid": 111,
"uh": "www.site.kz",
"cid": "4facd5427701a",
"ses": "4fcf7278f1f19",
"ip": NumberInt(1597732605),
"t": ISODate("2012-06-06T15: 08: 40.0Z"),
...
}
Destination collection format:
{
"_id":

{sid: 111, ses:"4fcf7278f1f19"}

,
"t": ISODate("2012-06-06T15: 08: 40.0Z"),
}



 Comments   
Comment by Kyle Suarez [ 28/Aug/18 ]

Hi all,

We've completed work on SERVER-18027, which enables users to use the $out aggregation stage to write to an existing sharded collection. With this, users can use the aggregation framework to achieve the same semantics as an INSERT ... SELECT ..., even when the target collection is sharded.

I'm going to close this ticket as a duplicate of SERVER-18207. For more details, please see SERVER-18207 and its related tickets.

Regards,
Kyle

Comment by Asya Kamsky [ 22/Feb/18 ]

This would be addressed by aggregation $out option once it's able to append to existing collection (whether that collection is sharded or unsharded).

SERVER-12280 plus SERVER-18027 are tracking that feature.

Comment by Pavel Chertorogov [ 08/Jun/12 ]

Detailed description of my task, for which I need mysql syntax
"INSERT ... SELECT ..." in mongo for sharded collection

Actually full document from SOURCE collection looks like this:
{
"_id": ObjectId("
4fd09e12d144ad3b020152ac"),
"sd": NumberInt(15498), // shard day - need for sharding for
parallel writes and best MapReduce tasks for reports by hour
"sh": NumberInt(12), // shard hour - need for sharding for
parallel writes and best MapReduce tasks for reports by hour
"sr": NumberInt(754), // shard random - need for sharding for
parallel writes and best MapReduce tasks for reports by hour
"sid": NumberInt(53255), // site id
"uh": "knigo.info", // site host
"up": "\/publ\/avtorskie_proekty\/sharzhi_s_alekseeva\/30", // site
uri
"cid": "4fd09c85647e5", // client id
"ses": "4fd09c8564e99", // session id
"ip": NumberInt(1595598202), // client ip
"t": ISODate("2012-06-07T12: 20: 21.0Z"), // request time
"hs": NumberInt(1), // number of hits (on buttons "front-end"
servers we have 5 minute preagregation)
"ix":

{ "0": NumberInt(55), // IX - encodes browser, screensize, operation system "1": NumberInt(180), "2": NumberInt(770), "3": NumberInt(862), "4": NumberInt(1973) }

,
"cm":

{ "0": NumberInt(57389) // CM - encodes country and mobile device }

}

Data to this collection writes from 3 "front-end" servers (actually
buttons, which shows to users). This servers works user-agents,
cookies data, obtain ip, check country and another pre-processing.

One of type of report is statistics for every site by unique host, ip
and users for past 24 hours. This report generates every hour.

So most preferable way to do this report (from many others) is
create collection
{
_id:

{site_id, session}

,
t:DateTime()
}
from SOURCE collection.
Make index on field "t". Data older than 25 hour can be deleted.

Of course I don't want write this collection from "front-end" servers.
Because also need write such collection abount users and ip. In future
may be need write additional collections for new reports.

So my decision is write raw data from button to one collection. And
other proccesing make inside mongo.

Generated at Thu Feb 08 03:10:33 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.