[DOCS-14763] $sample aggregation pipeline incorrectly warns "$sample may output the same document more than once in its result set." Created: 26/Aug/21  Updated: 30/Oct/23  Resolved: 13/Oct/21

Status: Closed
Project: Documentation
Component/s: manual, Server
Affects Version/s: None
Fix Version/s: Server_Docs_20231030

Type: Bug Priority: Major - P3
Reporter: David Walker Assignee: Jeffrey Allen
Resolution: Fixed Votes: 0
Labels: aggregation-framework
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Participants:
Days since reply: 2 years, 17 weeks ago
Epic Link: DOCSP-11701
Story Points: 2

 Description   

The documentation for the $sample aggregation pipeline warns:

$sample may output the same document more than once in its result set.

It appears this warning may be residual from the introduction of the feature in 3.2, where duplicates were possible when using MMAPv1. When using WiredTiger, there are two methods for $sample to obtain random documents.

The first of which uses a pseudo-random cursor to select documents, which has a means to prevent duplicates from being returned, and will error if it falls short of accomplishing deduplication.

The second method will perform a collection scan by _id, which should never return duplicates when WiredTiger is employed but may have resulted in duplicates with MMAPv1.

My understanding is that the warning should only be applicable when MMAPv1 was potentially in use as the storage engine, as neither method used by $sample to obtain random documents will return duplicates when WiredTiger is in use.

As it stands now, this warning may (unnecessarily) prevent this feature from being considered for a number of use cases.



 Comments   
Comment by Githook User [ 13/Oct/21 ]

Author:

{'name': 'Jeff Allen', 'email': 'jeffrey.allen@10gen.com', 'username': 'jeff-allen-mongo'}

Message: (DOCS-14763): Clarify sample agg behavior warning
Branch: v4.0
https://github.com/mongodb/docs/commit/4d24ffc5da45d367076e32608a2a54f82698df47

Comment by Githook User [ 13/Oct/21 ]

Author:

{'name': 'Jeff Allen', 'email': 'jeffrey.allen@10gen.com', 'username': 'jeff-allen-mongo'}

Message: (DOCS-14763): Clarify sample agg behavior warning
Branch: v4.2
https://github.com/mongodb/docs/commit/9df4c43c080f0f535db330a7f0f5076c3d5ac33f

Comment by Githook User [ 13/Oct/21 ]

Author:

{'name': 'Jeff Allen', 'email': 'jeffrey.allen@10gen.com', 'username': 'jeff-allen-mongo'}

Message: (DOCS-14763): Clarify sample agg behavior warning
Branch: v4.4
https://github.com/mongodb/docs/commit/e0b39e9190f7810eff387e2e52606a89b188c013

Comment by Githook User [ 13/Oct/21 ]

Author:

{'name': 'Jeff Allen', 'email': 'jeffrey.allen@10gen.com', 'username': 'jeff-allen-mongo'}

Message: (DOCS-14763): Clarify sample agg behavior warning
Branch: v5.0
https://github.com/mongodb/docs/commit/dcbe816a7a40346d9233f54e47bfc422fd5cd76f

Comment by Githook User [ 13/Oct/21 ]

Author:

{'name': 'Jeff Allen', 'email': 'jeffrey.allen@10gen.com', 'username': 'jeff-allen-mongo'}

Message: (DOCS-14763): Clarify sample agg behavior warning
Branch: master
https://github.com/mongodb/docs/commit/e65601ed005526a554aa08d8aa74ea6f1eddbde4

Generated at Thu Feb 08 08:11:08 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.