[SERVER-24710] Optimize $sample+$project Created: 22/Jun/16  Updated: 06/Dec/22  Resolved: 22/Jun/16

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Minor - P4
Reporter: Ross Lawley Assignee: Backlog - Query Team (Inactive)
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-21887 $sample takes disproportionately long... Closed
duplicates SERVER-23661 $sample takes disproportionately long... Closed
Related
is related to SPARK-64 Sampling then projecting in the Mongo... Closed
Assigned Teams:
Query
Participants:

 Description   

When performing a pipeline doing a sample then project I noticed it was much slower than doing a project then sample. Could this be optimized? In a similar fashion as to: https://docs.mongodb.com/manual/core/aggregation-pipeline-optimization

For example with the MovieLens dataset ~1million documents:

Pipeline: $sample + $project _id: 76120 ms
Pipeline: $project _id + $sample: 1124 ms



 Comments   
Comment by Ross Lawley [ 22/Jun/16 ]

Duplicate of SERVER-23661

Tested on 3.2.7 mongorestoring data.

$sample + $project is much slower than $project + $sample until the mongod is restarted. Then its much faster.

Comment by Ross Lawley [ 22/Jun/16 ]

Looks to be a duplicate of SERVER-21887

Generated at Thu Feb 08 04:07:11 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.