[SERVER-14881] Ability to easily save cursor contents to file or collection Created: 13/Aug/14  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: JavaScript, Querying, Shell
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Minor - P4
Reporter: Kevin Pulo Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-14880 Ability to output to file from mongo ... Backlog
duplicates SERVER-12624 Support writing to (bson) files from ... Closed
Assigned Teams:
Query Optimization
Participants:

 Description   

It's sometimes desirable to take the results of a find() (or something else that returns a cursor, like aggregation) and store the resulting documents somewhere, eg. in some other collection, or in a json or bson file (ala SERVER-12624).

The idea is that while working interactively in the shell, once you find a query that works well you can save the results (for use by some other tool) quickly and easily by just adding ".saveTo({ns: "db.coll" })" or .saveTo({ file: "output.json" }) to the end of the line, eg:

db.foo.find( { something: "value" }, { something: 1, interesting: 1 } ).limit(5000).saveTo({ db: "some", collection: "where" })
db.foo.find( { something: "value" }, { something: 1, interesting: 1 } ).limit(5000).saveTo({ file: "sample.json" })

A naive client-side JS implementation might be something like:

DBQuery.prototype.saveTo = function(target) {
	if (target.db || target.collection || target.ns) {
		if (target.db && target.collection) {
			t = this._mongo.getDB(target.db).getCollection(target.collection);
		} else if (target.collection) {
			t = this._db.getCollection(target.collection);
		} else if (target.ns) {
			t = this._mongo.getCollection(target.ns);
		}
		while (this.hasNext())
			t.insert(this.next(), target.options, target._allow_dot);
	} else if (target.file) {
		if (target.type === undefined) {
			if (target.file.endsWith(".json")) {
				target.type = "json";
			} else if (target.file.endsWith(".bson")) {
				target.type = "bson";
			}
		}
		if (target.type == "bson") {
			// SERVER-12624
			this.dump(target.file);
		} else if (target.type == "json") {
			if (target.pretty) {
				oneline = target.pretty ? false : true;
			}
			if (target.oneline) {
				oneline = target.oneline ? true : false;
			}
			// needs fprint() (SERVER-14880)
			while (this.hasNext())
				fprint(target.file, tojson(this.next(), "", oneline));
			fclose(target.file);
		}
	}
};

This might be good enough to start with. Doing this server-side (to eliminate the network traffic) would be possible by using db.eval with nolock.

Further improvements might include:

  • using bulk inserts to insert a full cursor batch at a time
  • server-side support for an $out parameter for finds (like $out for Map-Reduce and Aggregation). Instead of returning the cursor to the client, the server would internally iterate over the cursor and insert the results to the specified collection, and returns the status of this procedure. In this case, the shell saveTo() implementation would reduce to calling _addSpecial, and it could appear anywhere in the function chain rather than only at the end. The file-based output would probably remain client-side, though.


 Comments   
Comment by Kyle Suarez [ 26/Oct/21 ]

If the desire is to have this functionality in the shell, perhaps we should move this to the MONGOSH project?

Comment by Asya Kamsky [ 19/Jul/18 ]

kevin.pulo does $out not support something like this (saves results on the server which can then be used elsewhere). I understand syntactically it's an option to aggregate and not find but otherwise is that the general idea?

 

Generated at Thu Feb 08 03:36:16 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.