Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-14881

Ability to easily save cursor contents to file or collection

    • Type: Icon: Improvement Improvement
    • Resolution: Unresolved
    • Priority: Icon: Minor - P4 Minor - P4
    • None
    • Affects Version/s: None
    • Component/s: JavaScript, Querying, Shell
    • Labels:
      None
    • Query Optimization

      It's sometimes desirable to take the results of a find() (or something else that returns a cursor, like aggregation) and store the resulting documents somewhere, eg. in some other collection, or in a json or bson file (ala SERVER-12624).

      The idea is that while working interactively in the shell, once you find a query that works well you can save the results (for use by some other tool) quickly and easily by just adding ".saveTo({ns: "db.coll" })" or .saveTo({ file: "output.json" }) to the end of the line, eg:

      db.foo.find( { something: "value" }, { something: 1, interesting: 1 } ).limit(5000).saveTo({ db: "some", collection: "where" })
      db.foo.find( { something: "value" }, { something: 1, interesting: 1 } ).limit(5000).saveTo({ file: "sample.json" })
      

      A naive client-side JS implementation might be something like:

      DBQuery.prototype.saveTo = function(target) {
      	if (target.db || target.collection || target.ns) {
      		if (target.db && target.collection) {
      			t = this._mongo.getDB(target.db).getCollection(target.collection);
      		} else if (target.collection) {
      			t = this._db.getCollection(target.collection);
      		} else if (target.ns) {
      			t = this._mongo.getCollection(target.ns);
      		}
      		while (this.hasNext())
      			t.insert(this.next(), target.options, target._allow_dot);
      	} else if (target.file) {
      		if (target.type === undefined) {
      			if (target.file.endsWith(".json")) {
      				target.type = "json";
      			} else if (target.file.endsWith(".bson")) {
      				target.type = "bson";
      			}
      		}
      		if (target.type == "bson") {
      			// SERVER-12624
      			this.dump(target.file);
      		} else if (target.type == "json") {
      			if (target.pretty) {
      				oneline = target.pretty ? false : true;
      			}
      			if (target.oneline) {
      				oneline = target.oneline ? true : false;
      			}
      			// needs fprint() (SERVER-14880)
      			while (this.hasNext())
      				fprint(target.file, tojson(this.next(), "", oneline));
      			fclose(target.file);
      		}
      	}
      };
      

      This might be good enough to start with. Doing this server-side (to eliminate the network traffic) would be possible by using db.eval with nolock.

      Further improvements might include:

      • using bulk inserts to insert a full cursor batch at a time
      • server-side support for an $out parameter for finds (like $out for Map-Reduce and Aggregation). Instead of returning the cursor to the client, the server would internally iterate over the cursor and insert the results to the specified collection, and returns the status of this procedure. In this case, the shell saveTo() implementation would reduce to calling _addSpecial, and it could appear anywhere in the function chain rather than only at the end. The file-based output would probably remain client-side, though.

            Assignee:
            backlog-query-optimization [DO NOT USE] Backlog - Query Optimization
            Reporter:
            kevin.pulo@mongodb.com Kevin Pulo
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated: