Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-533

Aggregation stage to randomly sample documents

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.1.6
    • Labels:
      None
    • Environment:
      Global, all environments
    • Backwards Compatibility:
      Fully Compatible

      Description

      We've decided to go about this by adding a new aggregation stage: $sample. Given a positive integer, the stage will pseudo-randomly choose that number of documents from the incoming stream of documents, which is implicitly the entire collection when $sample is the first stage in the pipeline.

      Note that this ticket will only track the aggregation stage functionality, and this implementation will be very slow until SERVER-19183 and SERVER-19182 are resolved.

      Original description:
      Picking a random item from a collection is used in many cases. For example, you want a random item from the collection photos. Currently this can be accomplished by counting the resulting query, computing a random index within that count, and then getting that item with that random index.

      A easier approach would be requesting a random item directly from mongo given a query

      photos.find(

      {"author":"johndoe"}

      ).random()
      // this would act like .next() but instead would simply return a random item that matches the query

      photos.random_one(

      {"author":"johndoe"}

      )
      // this would act just like find_one, except it would return a random item that matches the query

        Issue Links

          Activity

          Hide
          charlie.swanson Charlie Swanson added a comment -

          rgpublic, I have updated the description to match what we plan to do.

          Show
          charlie.swanson Charlie Swanson added a comment - rgpublic , I have updated the description to match what we plan to do.
          Hide
          davidn David Nadle added a comment -

          Based on the description it sounds like this is sampling without replacement. Right?

          Show
          davidn David Nadle added a comment - Based on the description it sounds like this is sampling without replacement. Right?
          Hide
          Jeroenooms Jeroen Ooms added a comment -

          Actually if there would be an additional option for sampling with replacement, that would open up some very powerful statistical applications.

          Show
          Jeroenooms Jeroen Ooms added a comment - Actually if there would be an additional option for sampling with replacement, that would open up some very powerful statistical applications.
          Hide
          charlie.swanson Charlie Swanson added a comment -

          David Nadle the short answer is yes, the algorithm is sampling without replacement. However, it is not guaranteed that all returned documents will be unique, as that is not a guarantee of the query system (see here)

          Jeroen Ooms while sampling with replacement would be nice, it is out of scope for this part of the project. If this is an important functionality for you, please open a new issue so we can track that work.

          Show
          charlie.swanson Charlie Swanson added a comment - David Nadle the short answer is yes, the algorithm is sampling without replacement. However, it is not guaranteed that all returned documents will be unique, as that is not a guarantee of the query system (see here ) Jeroen Ooms while sampling with replacement would be nice, it is out of scope for this part of the project. If this is an important functionality for you, please open a new issue so we can track that work.
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'cswanson310', u'name': u'Charlie Swanson', u'email': u'charlie.swanson@mongodb.com'}

          Message: SERVER-533 Add aggregation stage to randomly sample documents
          Branch: master
          https://github.com/mongodb/mongo/commit/610765fdb94eebf612bd0172ec081ccc21110103

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'cswanson310', u'name': u'Charlie Swanson', u'email': u'charlie.swanson@mongodb.com'} Message: SERVER-533 Add aggregation stage to randomly sample documents Branch: master https://github.com/mongodb/mongo/commit/610765fdb94eebf612bd0172ec081ccc21110103

            Dates

            • Created:
              Updated:
              Resolved:
              Days since reply:
              1 week, 1 day ago
              Date of 1st Reply:

              Agile