Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-13201

Allow new Aggregation $out operator to explicitly name a DB to write to

    Details

      Description

      Using 2.6.0rc1 (Linux x86-64) I've been doing some research into speeding up some Aggregation use cases via Parallelisation. For the full investigation see here: http://pauldone.blogspot.co.uk/2014/03/mongoparallelaggregation.html

      One of the main outcomes, was although a good speed-up can be achieved with multiple threads each running aggregate() on a subset of the collection's data, the main thing holding back further performance improvement was the threads queueing to write out to result collections in the same database, queueing for the DB write-lock.

      In the tests, the $out operator http://docs.mongodb.org/master/reference/operator/aggregation/out/ is being used to specify different output collection for each thread's aggregate() invocation. However the $out operator does not allow one to specify a named database, in addition to a named collection. As a result, the same database as the aggregation's source collection is assumed and it's not possible to use different databases, to remove the write-lock bottleneck for such use cases.

      Please consider enhancing the $out operator to support declaring a target database in addition to a target collection, in a similar manner to how this can already be achieved today in MongoDB's MapReduce function (specifically the mapReduce() function's 'out' option - http://docs.mongodb.org/manual/reference/method/db.collection.mapReduce/#mapreduce-out-mtd )

      Thanks Paul

        Issue Links

          Activity

          Hide
          dancerjohn John Butler added a comment -

          The one the reasons this is very helpful is to write temp / search results to a DB that is not part of a replica set.

          Show
          dancerjohn John Butler added a comment - The one the reasons this is very helpful is to write temp / search results to a DB that is not part of a replica set.
          Hide
          jon.rangel Jon Rangel added a comment -

          This is also useful in a sharded cluster. If the output of aggregration can go to a different database then the outputs from different aggregations can be sent to different shards. Querying of those output collections does not then bottleneck on the primary shard of the source database.

          Show
          jon.rangel Jon Rangel added a comment - This is also useful in a sharded cluster. If the output of aggregration can go to a different database then the outputs from different aggregations can be sent to different shards. Querying of those output collections does not then bottleneck on the primary shard of the source database.

            People

            • Votes:
              13 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

              • Created:
                Updated: