[SERVER-24804] Enable larger result sets within $facet stage Created: 27/Jun/16  Updated: 05/Dec/22

Status: Backlog
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Minor - P4
Reporter: Greg Assignee: Backlog - Query Execution
Resolution: Unresolved Votes: 3
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-35641 $FACET throws "BufBuilder attempted t... Closed
Assigned Teams:
Query Execution
Backwards Compatibility: Fully Compatible
Participants:
Case:

 Description   

SERVER-23654 introduced the $facet stage, which combines all outputs of the sub-pipelines into one result document. This is a request to enable some mechanism for returning more than 16MB of results, while still being able to process the same initial result set in multiple ways.

Original Request

I have several similar aggregation operations at the same time, for example

db.cases.aggregate([
   {$match : query},
   {$unwind : "factors"},
 
   //operation 1 of the above result
   // ...
])
 
db.cases.aggregate([
   {$match : query},
   {$unwind : "factors"},
 
   //operation 2 of the above result
   // ...
])

The first two stages of aggregation( $match, $unwind ) are the same, and I think it would be a waste to repeat the duplicate stages. So I am asking if there exists a way to forking the pipeline, so that it can share the result from the first two stages, as follows,

db.cases.aggregation([
   {$match : query},
   {$unwind : "factors"},
   forks : [
      {... operation 1},
      {... operation 2}
   ]
])

http://stackoverflow.com/questions/38047527/fork-the-pipeline-of-aggregation



 Comments   
Comment by Charlie Swanson [ 27/Jun/16 ]

profesor79 OK. In that case, I'm re-opening this ticket as a feature request to support larger result sets within the $facet stage.

Comment by Greg [ 27/Jun/16 ]

charlie.swanson I'm using mongo as log persistence, so sometimes weekly results are having more than 60-70MB... per query

Comment by Charlie Swanson [ 27/Jun/16 ]

profesor79 no, $facet will error if the resulting document exceeds 16MB. It is thought to be unlikely for this to happen, although that is a legitimate request. What is your use case where you expect a large number of documents output within the $facet stage?

Comment by Greg [ 27/Jun/16 ]

charlie.swanson ($facet) will the output array work with document size >16MB

Comment by Greg [ 27/Jun/16 ]

As we have only one cursor, my idea was to have a kind of for pipe where we could tag forked results

{$fork:{
[tag1] :[stages],
[tag2] :[stages],
[tagn] :[stages],
}}

then result documents will have a field fork:tag1 - to distinguish from which forked pipe document was added to main cursor.

makes this sense?

Comment by Charlie Swanson [ 27/Jun/16 ]

Hi profesor79, this looks like the $facet stage we just introduced! I'm closing it as a duplicate of that ticket. Please re-open if you feel like this is a distinct request, and clarify what you would like that is not covered by SERVER-23654. Thanks!

Comment by Ramon Fernandez Marina [ 27/Jun/16 ]

profesor79, I'm sending this to the Query team for consideration. I think more details/discussion will be needed around how to distinguish the output of each operation, so if you have thought about this and could provide further details of the behavior you have in mind that will be helpful.

Thanks,
Ramón.

Generated at Thu Feb 08 04:07:28 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.