[SERVER-77170] Handle non "neutral" init functions in $accumulator Created: 16/May/23  Updated: 18/Jul/23  Resolved: 18/Jul/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Ivan Fefer Assignee: Backlog - Query Execution
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-77531 Make $accumulate resilient to spilling Closed
Related
Assigned Teams:
Query Execution
Participants:
Linked BF Score: 45

 Description   

$accumulator allows users to specify their own accumulator using JavaScript.

It has several functions. We will focus on this three:

init - created a new group (may use group id as arguments).
accumulate - adds a new document to the group.
merge - merges two groups (in case of sharding or spilling).

Currently our code assumes that we can call init() any number of times.

For example, consider group of two documents.

If there is no spilling, we will call the following functions:

init() -> accumulate() -> accumulate()

However, if we spill after each document, we will have the following sequence of calls:

init() -> accumulate() -> /spill/ -> init() -> accumulate() -> /spill/ -> init() -> merge() -> merge().

If init() doesn't return a neutral element in relation to accumulate() and merge(), we are in trouble.

In this ticket we should discuss how we want to handle this case: either assert init is correct and provide users with a comprehensive error message or re-design $accumulator to call init exactly once.

We also should note our decision in documentation to be more explicit.



 Comments   
Comment by Ana Meza [ 18/Jul/23 ]

Closing as Won't Fix in favor of  DOCSP-31353 i

Comment by Xiaochen Wu [ 13/Jul/23 ]

 DOCSP-31353 is created to track this.

Comment by Xiaochen Wu [ 13/Jul/23 ]

Considering we are not recommending customers to use this function, we can just document the behavior and close the ticket as won't fix. 

Creating a DOC ticket and send it back to engineering triage. 

Comment by Ivan Fefer [ 12/Jul/23 ]

Basically init() should have a following properties:

  1. merge(init(), <group>) == <group>
  2. aggregate(init(), <group>) == <group>

So it should return a neutral or identity element in relation to accumulate() and merge() operations.

Comment by Kyle Suarez [ 11/Jul/23 ]

N.B. the usage of "neutral" here to describe init() is that init() is a fixed point.

My hot take is that this is not something that seems like it should be prioritized but I am deferring to Product.

Generated at Thu Feb 08 06:34:43 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.