[DOCS-16193] [SERVER] Mention non-guaranteed order of $accumulator Created: 09/Jun/23  Updated: 22/Jan/24

Status: Backlog
Project: Documentation
Component/s: manual, Server
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Minor - P4
Reporter: David Percy Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: backlog, request
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Participants:
Days since reply: 34 weeks, 2 days ago

 Description   

The $accumulator operator lets you define a custom accumulator using Javascript: https://www.mongodb.com/docs/v6.0/reference/operator/aggregation/accumulator/

Part of the contract between the user and the server is that the server is free to decide the order and grouping when it calls init()/accumulate()/merge(), and so the user is responsible for making sure these functions are insensitive to order and grouping.

We do allude to this, because we document the conditions when merge() is called: https://www.mongodb.com/docs/v6.0/reference/operator/aggregation/accumulator/#merge-two-states-with--merge. But maybe we should be more explicit about the assumptions the server makes about the user's init()/accumulate()/merge() functions.

For example, here's an example of a bad, grouping-sensitive $accumulator:

{$accumulator: {
   init: function () {return "a";},
   accumulate: function(state, arg) {return state + arg;},
   accumulateArgs: ["b"],
   merge: function(state1, state2) {return state1 + state2;}
   lang: "js"
}}

This accumulator is bad because it gives you a different answer depending on how the server chooses to do the grouping:

// It can group this way:
accumulate(accumulate(init(), "b"), "b") = ("a" + "b") + "b" = "abb"
 
// or this way instead:
merge(init(), accumulate(accumulate(init(), "b"), "b")) = "a" + (("a" + "b") + "b") = "aabb"

If you think something precise would be useful, I think this captures it:

// merge() is associative and commutative
merge(state1, merge(state2, state3)) == merge(merge(state1, state2), state3)
merge(state1, state2) == merge(state2, state1)
 
// init() is an identity
merge(init(), state) == state
merge(state, init()) == state
 
// accumulate() and merge() are related
accumulate(state, value) == merge(state, accumulate(init(), value))



 Comments   
Comment by Sarah Olson [ 12/Jun/23 ]

Thanks david.percy@mongodb.com. We'll take a look as part of our backlog. 

Comment by David Percy [ 09/Jun/23 ]

alya.berciu@mongodb.com let me know if this is helpful.

Rereading the Slack thread which prompted this:

If we don't end up with multiple states, I think a user would expect that we could just get away with not calling merge at all.

I think that's a fair description of the behavior today, but not something we guarantee, and not something users should try to predict. For example (just thinking now) in the future if we had more intraquery parallelism by default, that would be a new reason the server might choose to keep several states.

Generated at Thu Feb 08 08:14:48 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.