[DOCS-9744] $sample can only use a random cursor if it is the first stage in the pipeline. Created: 09/Jan/17  Updated: 30/Oct/23  Resolved: 18/Jan/17

Status: Closed
Project: Documentation
Component/s: Server
Affects Version/s: None
Fix Version/s: Server_Docs_20231030

Type: Improvement Priority: Major - P3
Reporter: Charlie Swanson Assignee: Steve Renaker (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Participants:
Days since reply: 7 years, 3 weeks, 6 days ago

 Description   

The $sample stage docs mention that if N is less than 5% of the collection the $sample stage can be optimized to use a random cursor. This is only true if the $sample stage comes first in the pipeline, and that should be noted.



 Comments   
Comment by Githook User [ 19/Jan/17 ]

Author:

{u'username': u'steveren', u'name': u'Steve Renaker', u'email': u'steve.renaker@mongodb.com'}

Message: DOCS-9744: $sample can only use a random cursor if it is the first stage in the pipeline

Signed-off-by: kay <kay.kim@10gen.com>
Branch: master
https://github.com/mongodb/docs/commit/0abc80406bb06b91e9de6657d7032e6b7c605e91

Comment by Charlie Swanson [ 10/Jan/17 ]

Yes, that is correct.

Comment by Steve Renaker (Inactive) [ 09/Jan/17 ]

Thanks charlie.swanson. I take it that's the case for both MMAPv1 and WT?

Comment by Charlie Swanson [ 09/Jan/17 ]

steve.renaker if the $sample stage is not the first stage in the pipeline then it cannot be optimized to do any sort of random walk over an index. It will fall back to doing a randomized sort on the input stream of documents. We can't say for certain that it will do a collection scan, since the first stage may be a $match stage which will use an index. Whatever set of documents are output from the stage before $sample, the $sample stage will assign each a random value between 0 and 1 and sort based on this random value (which is attached as metadata, and will not show up in the output documents), then take the first N.

Comment by Steve Renaker (Inactive) [ 09/Jan/17 ]

charlie.swanson What happens if the $sample stage doesn't come first in the pipeline? Does it use the _id index to randomly select N documents? Thanks.

Generated at Thu Feb 08 07:59:03 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.