[DOCS-9744] $sample can only use a random cursor if it is the first stage in the pipeline. Created: 09/Jan/17 Updated: 30/Oct/23 Resolved: 18/Jan/17 |
|
| Status: | Closed |
| Project: | Documentation |
| Component/s: | Server |
| Affects Version/s: | None |
| Fix Version/s: | Server_Docs_20231030 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Charlie Swanson | Assignee: | Steve Renaker (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Participants: | |
| Days since reply: | 7 years, 3 weeks, 6 days ago |
| Description |
|
The $sample stage docs mention that if N is less than 5% of the collection the $sample stage can be optimized to use a random cursor. This is only true if the $sample stage comes first in the pipeline, and that should be noted. |
| Comments |
| Comment by Githook User [ 19/Jan/17 ] |
|
Author: {u'username': u'steveren', u'name': u'Steve Renaker', u'email': u'steve.renaker@mongodb.com'}Message: Signed-off-by: kay <kay.kim@10gen.com> |
| Comment by Charlie Swanson [ 10/Jan/17 ] |
|
Yes, that is correct. |
| Comment by Steve Renaker (Inactive) [ 09/Jan/17 ] |
|
Thanks charlie.swanson. I take it that's the case for both MMAPv1 and WT? |
| Comment by Charlie Swanson [ 09/Jan/17 ] |
|
steve.renaker if the $sample stage is not the first stage in the pipeline then it cannot be optimized to do any sort of random walk over an index. It will fall back to doing a randomized sort on the input stream of documents. We can't say for certain that it will do a collection scan, since the first stage may be a $match stage which will use an index. Whatever set of documents are output from the stage before $sample, the $sample stage will assign each a random value between 0 and 1 and sort based on this random value (which is attached as metadata, and will not show up in the output documents), then take the first N. |
| Comment by Steve Renaker (Inactive) [ 09/Jan/17 ] |
|
charlie.swanson What happens if the $sample stage doesn't come first in the pipeline? Does it use the _id index to randomly select N documents? Thanks. |