[SERVER-29161] Ability to access previous document in $group aggregation Created: 12/May/17 Updated: 06/Dec/22 Resolved: 01/Jun/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Aggregation Framework |
| Affects Version/s: | 3.5.6 |
| Fix Version/s: | None |
| Type: | New Feature | Priority: | Major - P3 |
| Reporter: | Gabriel Zimmermann | Assignee: | Backlog - Query Optimization |
| Resolution: | Duplicate | Votes: | 8 |
| Labels: | expression | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Assigned Teams: |
Query Optimization
|
||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Description |
|
When performing an aggregation query in mongodb, it would sometimes be useful to reference the previous document. As we have the `$$ROOT` variable, perhaps we could have `$$PREVIOUS` or some other meaningful name that let's us do document to document calculations. A possible use case would be calculating time passed in between documents more easily, and calculating deltas between documents. |
| Comments |
| Comment by Joe Kanaan [ 01/Jun/21 ] | ||
|
In 5.0, you will be able to use $shift and $setWindowFields to achieve this. | ||
| Comment by Tad Yeager [ 20/Dec/19 ] | ||
|
We want to coalesce image recognition events that have a start and end time. If the gap between the start time of the $$CURRENT event and the end of the $$PREVIOUS event is less than some duration, we would group the documents into a new event. | ||
| Comment by Brett Gray [ 05/Dec/19 ] | ||
|
asya I have a customer that could also use the suggested $$PREVIOUS to assist in calculate\ing variance of products created compared to the previous week. | ||
| Comment by Asya Kamsky [ 17/Jan/18 ] | ||
|
It's conceivable that this would be a useful expression in any stage (i.e. $project/$addFields). | ||
| Comment by Asya Kamsky [ 05/Jul/17 ] | ||
Okay, I think I understand the example. I do think the linked ticket, SERVER-29339 would allow something similar, by allowing saving/accumulating previous value(s). We will need to figure out which syntax would allow for more readable (and/or more performant) implementation. | ||
| Comment by Gabriel Zimmermann [ 04/Jul/17 ] | ||
Yeah, in some cases we store values that have only changed since the previous reading, in other case we are particularly interested in storing deltas directly. However, I think with the rise of IoT applications (and my company is particularly invested in this), the need to store sensor data or data from devices that produce a really large volume of information is growing, and pushing deltas for each device might become expensive/convoluted. Our aggregation queries are complex enough at the moment unfortunately and they take their fair amount of seconds to execute.
Please excuse my ignorance here, I do not know how the `$group` stage works under the hood so this request is honestly coming from my lack of knowledge. What I meant was that I want `$$PREVIOUS` (or more suitable name), to point to the previous `$ROOT` processed for that group. So that, if my group is `_id: "$deviceId"`, when I'm processing the second document for that `group`, `$$PREVIOUS` will be a reference to the first document, and so on. so that I can do something like "$$PREVIOUS.temperature - $$ROOT.temperature". Does that make sense? Thanks | ||
| Comment by Asya Kamsky [ 17/May/17 ] | ||
|
My question was due to having some solutions similar to the one charlie.swanson linked to. In general, such computations are done for a particular time period, right? So pushing all deltas for each "device" to an array should be pretty doable. Though I'm guessing you don't mean deltas but rather storing only values that have changed since previous reading? The issue with providing "$$PREVIOUS" is that if the previous document was somehow transformed during the processing, so should this be the document before it was "processed" or after? Comparing elements in the array is simple enough (another example is here but it's less clear to me that providing $$PREVIOUS would make the resultant pipeline easier to understand. We will discuss this request during our next planning cycle, thanks for the suggestion! | ||
| Comment by Gabriel Zimmermann [ 17/May/17 ] | ||
|
Yeah, in my particular case, the size of our collection makes it almost impossible for us to create an array, unwind etc. That's clever though. I think the big picture is that when you are grouping, you often need a bit of context to process a given document, having access to the previous value makes it easier to perform conditional computations. | ||
| Comment by Charlie Swanson [ 17/May/17 ] | ||
|
I think this stack overflow question is related, I had to do some crazy stuff to reference the previous values. | ||
| Comment by Gabriel Zimmermann [ 17/May/17 ] | ||
|
Asya. One of the use cases is for IoT projects my company is developing for a big client. Most of this data is sensor measurements across time. At some point the volume of the data became too large and we moved to storing only changes in the measurements. So instead of storing things like temperature value every 5 seconds, we store only deltas in the temperature. Since we are dealing with changes, sometimes we want to know calculations based on these deltas. So in each document I'm forced to store the current value and the previous value so I can compare them. Sometimes I need to know if the temperature increased with respect to the last one, and I can't determine this at the moment in which I store this or I'd store it pre-calculated already. Other times I need to perform conditional logic based on whether a previous distance reading was greater/less than the current reading in a $group operation. It boils down to the fact that when you're storing deltas to some piece of information, often to reconstruct it, when you $group you need to perform delta-to-delta calculations. If I had access to the `previous` document, it would be trivial to do this, but for now each document stores its data, plus a bulk of data from the previous one. I think in general this would enable the aggregation framework to achieve a new level of complexity in the algorithms you can write, specially when the volume of the data is so large that it becomes convenient to store only changes to the data. | ||
| Comment by Asya Kamsky [ 16/May/17 ] | ||
|
arg20 can you give a brief description of your use case? What calculation/processing you are currently trying to do, etc? |