Uploaded image for project: 'Documentation'
  1. Documentation
  2. DOCS-14763

$sample aggregation pipeline incorrectly warns "$sample may output the same document more than once in its result set."

      The documentation for the $sample aggregation pipeline warns:

      $sample may output the same document more than once in its result set.
      

      It appears this warning may be residual from the introduction of the feature in 3.2, where duplicates were possible when using MMAPv1. When using WiredTiger, there are two methods for $sample to obtain random documents.

      The first of which uses a pseudo-random cursor to select documents, which has a means to prevent duplicates from being returned, and will error if it falls short of accomplishing deduplication.

      The second method will perform a collection scan by _id, which should never return duplicates when WiredTiger is employed but may have resulted in duplicates with MMAPv1.

      My understanding is that the warning should only be applicable when MMAPv1 was potentially in use as the storage engine, as neither method used by $sample to obtain random documents will return duplicates when WiredTiger is in use.

      As it stands now, this warning may (unnecessarily) prevent this feature from being considered for a number of use cases.

            Assignee:
            jeffrey.allen@mongodb.com Jeffrey Allen
            Reporter:
            dave.walker@mongodb.com David Walker
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved:
              2 years, 26 weeks, 6 days ago