Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-22573

$sample should stream results when possible

    • Type: Icon: Improvement Improvement
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 3.2.1
    • Component/s: Aggregation Framework
    • Labels:
      None

      Please confirm that the `$sample` aggregation operator streams results to the client immediately whenever possible. If not, consider this a request for improvement.

      For example:

      MongoDB Enterprise > db.serverStatus().storageEngine
      { "name" : "wiredTiger", "supportsCommittedReads" : true }
      MongoDB Enterprise > db.serverBuildInfo().version
      3.2.1
      MongoDB Enterprise > db.users.count()
      10000
      MongoDB Enterprise > db.users.aggregate([{$sample:{size: 5}}])
      

      Here the sampling implementation will use a WiredTiger random cursor to obtain each document. I wish to confirm that the first sample document identified is available to the client before the full sample set is computed.

      Please confirm for each of the sampling mechanism implemented in MongoDB 3.2 (with and without a query predicate), and also sharded cluster behavior.

      Use case: MongoDB Compass. We want to permit users to specify large sample sizes in Compass, then progressively display the results to users. Ideally Compass can begin to display an inferred schema after a very small number of documents are received from the server.

            Assignee:
            charlie.swanson@mongodb.com Charlie Swanson
            Reporter:
            matt.kangas Matt Kangas
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: