[SERVER-40117] "$exit" aggregation stage (with $cond operator support) Created: 14/Mar/19  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Trivial - P5
Reporter: Jonah Werre Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-56832 Provide aggregation stage to abort pi... Closed
Related
related to SERVER-51889 Introduce new stage to peek / debug a... Closed
Assigned Teams:
Query Optimization
Participants:

 Description   

It would be nice if we could exit out of an Aggregate Pipeline. For example:

db.books.aggregate([
    {$match:{ _id: ObjectId('5bede977e8dd7e5d5df79dab') }},
    {$limit: 20},
    {$exit: { $cond: [ { $in: ['$genre',['science fiction', 'fantasy']}, true, false ] }},
    {$sort: { created: -1 }} // sort is skipped
]);

It would also be nice for debugging longer pipelines:

db.tapping.aggregate([
    {$match:{ _id: ObjectId('5bede977e8dd7e5d5df79dab') }},
    {$exit: true }, // rest of pipeline would be skipped
    ...
]);

Thanks for your consideration.



 Comments   
Comment by Jonah Werre [ 02/May/19 ]

I think it should "$exit" the pipeline and return a value "up to now".

Comment by Asya Kamsky [ 02/May/19 ]

jonah@surveyplanet.com

Is the intent to exit aggregation with no output or with "up to now" output or something else?

Comment by Jonah Werre [ 18/Mar/19 ]

Great, thanks Eric. Looking forward to see how it goes.

Comment by Eric Sedor [ 18/Mar/19 ]

jonah@surveyplanet.com, we're assigning this ticket to the appropriate team to be evaluated against our currently planned work. Updates will be posted on this ticket as they happen.

Comment by Andy Schwerin [ 14/Mar/19 ]

Interesting. You're essentially looking to have a branch or data-steering in the pipeline. Coincidentally, one side of your does no special work, but that's not inherent I think.

I wonder if your particular example might be implementable by writing two separate lookup stages, one that only matches if the question type is multiple choice and one if the question type is open-ended? Or by pushing the lookups into the facet stage, which already represents a kind of steering/branching.

Even if such a workaround is available, the general problem of describing a pipeline with data-controlled steering (making it more of a directed graph instead of the tree it is today) is super interesting.

Comment by Jonah Werre [ 14/Mar/19 ]

This is sudo code but the gist of it is that I'm tabulating answers for multiple choice and open-ended questions. For multiple choice questions I need all the answers so I can add them up a produce a summary of responses. But you can't add up open ended questions so I'm just showing the most recent 20. For example: 
 

	
     db.questions.aggregate([
		{
			$match: {
			        _id: ObjectId('5672f6d70c76e9cb2c053020')
			}
		},
		{
			$lookup: {
				from: 'answers',
				let: { 
					questionId: '$_id',
					questionType: '$type'
				},
				pipeline: [
					
					{ 
						$match: {
							$expr: {
								$eq: [ '$question', '$$questionId' ]
							}
						}
					},
					
					{
						$exit: {
							$cond: {
								if: {
									$eq: [
										'$$questionType', 
										'open_ended' // ---> only sort and limit if the question type is open-ended
									]
								},
								then: true,
								else: false
							}
						}
					},
					
					{
						$sort: {
							created: -1
						}
					},
					// ---> I cannot have a limit on multiple choice questions since I need to tabulate every value.
					{
						$limit: 20
					},
 
				],
				as: 'answers'
			} 
		},
		//  ...
		// further down the line I'm using a facet to tabulate the answers 
		// however open ended responses cannot be tabulate so I'm return
		// the 20 most recent
		
		{
			$facet: {
				
				'multiple-choice': [
					{ 
						$match: { 
							'question.type': 'multiple_choice'
						} 
					},
					{
						$unwind: '$answers'
					},
					{
						$group: {
							_id: '$question._id',
                                                           summar {
                                                                  // ... group and tabulate unique responses
                                                           }
						}
					}
				],
				
				'open-ended': [
					{
						$match: { 
							'question.type': 'open_ended'
						} 
					},
					{
						$addFields: {
							summary: '$answers'
						}
					},
				],
			}
		}
 
	]);

The result should look something like this:

[
		
	{
		_id: '5672f6d70c76e9cb2c053020',
		type: 'multiple_choice',
		title: "Yes or No?",
		summary: [
			{ label: 'Yes', total: 2 },
			{ label: 'No', total: 3 }
		]
	},
 
	{
		_id: '5672f6d70c76e9cb2c053020',
		type: 'open_ended',
		title: "What do you think?",
		summary: [
			{ answer: 'Orci varius.', created: '2019-03-14T21:18:15.688Z' },
			{ answer: 'Fusce porttitor.', created: '2019-03-14T21:17:15.688Z' },
			{ answer: 'Aliquam erat.', created: '2019-03-14T21:16:15.688Z' },
			{ answer: 'Integer varius.', created: '2019-03-14T21:15:15.688Z' },                        ...
		]
	}
 
]

 

 

Comment by Eric Sedor [ 14/Mar/19 ]

Thanks jonah@surveyplanet.com. We understand the potential value of $exit without $cond. But if you could further explain what you are looking to do with $exit+$cond and its potential benefit for your system, it will help us reason about this request.

Comment by Jonah Werre [ 14/Mar/19 ]

Thanks for the speedy reply Eric. To clarify I was thinking $exit would work differently than $match since the later stages of the pipeline would be ignored if $exit evaluates to true for any document.

In the first example, $match would pass any books that where not 'science fiction' or 'fantasy' and sort them while $exit would skip the $sort all together if it found any documents that were of those genres. This would essentially create a conditional $sort and return 20 document unsorted or 20 documents sorted by "created" date.

On the debugging front, I have used Compass in the past but it can be a little cumbersome keeping pipelines from Compass in sync with production code pipelines so I prefer to stay in my editor/IDE. It's just a personal preference that I'm sure a lot of developers share. As it stands now I often have to comment out big chunks of a pipeline to see the output. It would be nice to be able to "tap" the pipeline at any stage to see the result by adding an $exit stage. As an added benefit any $exit could be changed to false and left there for future debugging. 

Comment by Eric Sedor [ 14/Mar/19 ]

P.S., For now MongoDB Compass includes a pipeline builder feature that can be helpful debugging large pipelines

Comment by Eric Sedor [ 14/Mar/19 ]

Thanks for your request jonah@surveyplanet.com. Since an aggregation pipeline streams documents and the example involves an $exit+$cond evaluating documents, can you clarify what your desired behavior would be when a document reaches an $exit stage that evaluates to true?

Specifically, do you envision any documents that successfully passed an $exit:false condition before that point to still be passed through the later pipeline stages and returned as results? If so, then is it accurate to say that this would be similar to a $match stage that automatically excluded all documents after a single document failed to be matched?

Any additional information about the use-cases this feature would service would be helpful!

Generated at Thu Feb 08 04:54:04 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.