[SERVER-4437] aggregation: support windowing operation on pipelines Created: 06/Dec/11 Updated: 06/Dec/22 Resolved: 01/Jun/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Aggregation Framework |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | New Feature | Priority: | Major - P3 |
| Reporter: | Daniel Pasette (Inactive) | Assignee: | Backlog - Query Optimization |
| Resolution: | Done | Votes: | 10 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Assigned Teams: |
Query Optimization
|
||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Description |
|
postgres supports a windowing capability that allows for calculations within a window of visible data; this is for streaming data. It's easy to imagine supporting something like a $window pipeline operator, which specifies how many documents to include in the window. Within, aggregate expressions can reference documents within the window using some kind of indexing. This can be used to calculate things like moving averages, e.g., have a window of 5 items, and create a computed field that is (doc[0] + doc[-1] + doc[-2] + doc[-3] + doc[-4])/5, or something like that. |
| Comments |
| Comment by Asya Kamsky [ 06/Sep/17 ] |
|
More current links to description of group by windowing/rollup/cube functions: |
| Comment by Chris Westin [ 14/Dec/11 ] |
|
References from a postgres user: As mentioned, SQL Windows Functions might be a useful design from which to draw further feature inspiration, as it is likely compatible with your "pipeline" approach. I am using these on PostgreSQL and find the best documentation there. There was [PGCon] presentation slides from 2009 giving a feature introduction and some implementation details, as well as the standard PostgreSQL docs: [PG0] intro and [PG1] functions. The feature allows a PARTITION to be defined with similar options to GROUP BY, but then rather than collapsing to a single row for each group, preserves the original set of rows and allows access to functions of the PARTITION. For example, I make heavy use of the rank() function for a server side scoring algorithm. [PGCon]: http://www.pgcon.org/2009/schedule/events/128.en.html |